[
https://issues.apache.org/jira/browse/IMPALA-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell updated IMPALA-9413:
----------------------------------
Description:
GCC5+ uses a new ABI for std::string which has a small string optimization.
This allows it to avoid an extra memory allocation for strings up to 15
characters. This means that string.capacity() is 15 while still only using
sizeof(string), so calculations of memory usage that add sizeof(string) +
string.capacity are no longer correct. This happens in the query result set:
[https://github.com/apache/impala/blob/master/be/src/service/query-result-set.cc#L239-L241]
At the moment, Impala uses GCC 4.9.2, which does not have this optimization, so
this is only a problem when we switch to the new ABI.
I have attached a simple c++ file to demonstrate the difference. On GCC-4.9.2,
the output is:
{noformat}
joe@joemcdonnell:~/view2/Impala/stringcapacity$ ./a.out
init short_string
[Allocating 30 bytes]
sizeof(short_string): 8
short_string.size(): 5
short_string.capacity(): 5
init long_string
[Allocating 54 bytes]
sizeof(long_string): 8
long_string.size(): 29
long_string.capacity(): 29
{noformat}
On GCC 5.4.0:
{noformat}
init short_string
sizeof(short_string): 32
short_string.size(): 5
short_string.capacity(): 15
init long_string
[Allocating 30 bytes]
sizeof(long_string): 32
long_string.size(): 29
long_string.capacity(): 29{noformat}
was:
GCC5+ uses a new ABI for std::string which has a small string optimization.
This allows it to avoid an extra memory allocation for strings up to 15
characters. This means that string.capacity() is 15 while still only using
sizeof(string), so calculations of memory usage that add sizeof(string) +
string.capacity are no longer correct. This happens in the query result set:
[https://github.com/apache/impala/blob/master/be/src/service/query-result-set.cc#L225-L232]
[https://github.com/apache/impala/blob/master/be/src/service/query-result-set.cc#L239-L241]
At the moment, Impala uses GCC 4.9.2, which does not have this optimization, so
this is only a problem when we switch to the new ABI.
I have attached a simple c++ file to demonstrate the difference. On GCC-4.9.2,
the output is:
{noformat}
joe@joemcdonnell:~/view2/Impala/stringcapacity$ ./a.out
init short_string
[Allocating 30 bytes]
sizeof(short_string): 8
short_string.size(): 5
short_string.capacity(): 5
init long_string
[Allocating 54 bytes]
sizeof(long_string): 8
long_string.size(): 29
long_string.capacity(): 29
{noformat}
On GCC 5.4.0:
{noformat}
init short_string
sizeof(short_string): 32
short_string.size(): 5
short_string.capacity(): 15
init long_string
[Allocating 30 bytes]
sizeof(long_string): 32
long_string.size(): 29
long_string.capacity(): 29{noformat}
> Calculation of result set memory usage is wrong when using GCC7 with new ABI
> ----------------------------------------------------------------------------
>
> Key: IMPALA-9413
> URL: https://issues.apache.org/jira/browse/IMPALA-9413
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.4.0
> Reporter: Joe McDonnell
> Priority: Major
> Attachments: stringcapacity.cc
>
>
> GCC5+ uses a new ABI for std::string which has a small string optimization.
> This allows it to avoid an extra memory allocation for strings up to 15
> characters. This means that string.capacity() is 15 while still only using
> sizeof(string), so calculations of memory usage that add sizeof(string) +
> string.capacity are no longer correct. This happens in the query result set:
> [https://github.com/apache/impala/blob/master/be/src/service/query-result-set.cc#L239-L241]
> At the moment, Impala uses GCC 4.9.2, which does not have this optimization,
> so this is only a problem when we switch to the new ABI.
> I have attached a simple c++ file to demonstrate the difference. On
> GCC-4.9.2, the output is:
> {noformat}
> joe@joemcdonnell:~/view2/Impala/stringcapacity$ ./a.out
> init short_string
> [Allocating 30 bytes]
> sizeof(short_string): 8
> short_string.size(): 5
> short_string.capacity(): 5
> init long_string
> [Allocating 54 bytes]
> sizeof(long_string): 8
> long_string.size(): 29
> long_string.capacity(): 29
> {noformat}
> On GCC 5.4.0:
> {noformat}
> init short_string
> sizeof(short_string): 32
> short_string.size(): 5
> short_string.capacity(): 15
> init long_string
> [Allocating 30 bytes]
> sizeof(long_string): 32
> long_string.size(): 29
> long_string.capacity(): 29{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]