[
https://issues.apache.org/jira/browse/IMPALA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer updated IMPALA-340:
-----------------------------------
Description:
We currently store string data outside of a Tuple, with the string slot taking
up 8 bytes (4 bytes length, 8 bytes pointer, -4 bytes padding- (UPDATE:
IMPALA-7367 removed the padding)), which is hugely wasteful.
We need 2 improvements:
a more compact string slot: Intel architectures only use 48 bits of a 64-bit
address; strings are usually smaller than 64K; if the latter holds, we should
pack a string slot into 64 bits total
in-line representation of strings: schemas we've seen often use strings as ids
(which then also show up as foreign keys and are used heavily in joins), and
those are typically smaller than 8 bytes; in that case, we could simply store
the actual data in the string slot itself
See benchmarks/string-benchmark.cc.
See IMP-148 for more details.
was:
We currently store string data outside of a Tuple, with the string slot taking
up 8 bytes (4 bytes length, 8 bytes pointer, 4 bytes padding), which is hugely
wasteful.
We need 2 improvements:
a more compact string slot: Intel architectures only use 48 bits of a 64-bit
address; strings are usually smaller than 64K; if the latter holds, we should
pack a string slot into 64 bits total
in-line representation of strings: schemas we've seen often use strings as ids
(which then also show up as foreign keys and are used heavily in joins), and
those are typically smaller than 8 bytes; in that case, we could simply store
the actual data in the string slot itself
See benchmarks/string-benchmark.cc.
See IMP-148 for more details.
> Improve internal format of strings
> ----------------------------------
>
> Key: IMPALA-340
> URL: https://issues.apache.org/jira/browse/IMPALA-340
> Project: IMPALA
> Issue Type: Task
> Components: Backend
> Affects Versions: Impala 1.0
> Reporter: Nong Li
> Priority: Minor
> Labels: perfomance
>
> We currently store string data outside of a Tuple, with the string slot
> taking up 8 bytes (4 bytes length, 8 bytes pointer, -4 bytes padding-
> (UPDATE: IMPALA-7367 removed the padding)), which is hugely wasteful.
> We need 2 improvements:
> a more compact string slot: Intel architectures only use 48 bits of a 64-bit
> address; strings are usually smaller than 64K; if the latter holds, we should
> pack a string slot into 64 bits total
> in-line representation of strings: schemas we've seen often use strings as
> ids (which then also show up as foreign keys and are used heavily in joins),
> and those are typically smaller than 8 bytes; in that case, we could simply
> store the actual data in the string slot itself
> See benchmarks/string-benchmark.cc.
> See IMP-148 for more details.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]