[
https://issues.apache.org/jira/browse/IMPALA-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755201#comment-17755201
]
Joe McDonnell commented on IMPALA-12373:
----------------------------------------
That's a cool idea. I believe that libc++ has a more complicated implementation
that also uses some bytes from the length as well. If we don't need
null-termination, we might be able to fit 11 chars (1 bit to say its a small
string, 7 bits for length, the rest for data). If we need null termination,
then 10 chars.
If we can get to 10 characters, that helps with dates like YYYY/MM/DD.
Looking at TPC-H, there are several columns that have lengths that would fit.
> Implement Small String Optimization for StringValue
> ---------------------------------------------------
>
> Key: IMPALA-12373
> URL: https://issues.apache.org/jira/browse/IMPALA-12373
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Zoltán Borók-Nagy
> Priority: Major
>
> Implement Small String Optimization for StringValue.
> Current memory layout of StringValue is:
> {noformat}
> char* ptr; // 8 byte
> int len; // 4 byte
> {noformat}
> For small strings with size up to 8 we could store the string contents in the
> bytes of the 'ptr'. Something like that:
> {noformat}
> union {
> char* ptr;
> char small_buf[sizeof(ptr)];
> };
> int len;
> {noformat}
> Many C++ string implementations use the {{Small String Optimization}} to
> speed up work with small strings. For example:
> {code:java}
> Microsoft STL, libstdc++, libc++, Boost, Folly.{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]