[
https://issues.apache.org/jira/browse/IMPALA-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814294#comment-16814294
]
Csaba Ringhofer commented on IMPALA-8402:
-----------------------------------------
An idea for perf testing:
A special version of TPC-H could be created where some STRING columns are
systematically replaced with char(N).
- There are some 1 character flag columns, e.g. lineitem.l_returnflag. Using
CHAR(1) seems like a clear optimization in this case.
- Dates are currently stored as strings with fix length 10, (e.g.
lineitem.l_shipdate). Reducing the size of slot from 12 to 10 bytes and
removing the indirection could probably make some queries faster. Note that the
DATE type is in progress ( https://gerrit.cloudera.org/#/c/12481/ ), and it
will be a much better way to optimize these columns (4 byte slot, same
representation as in Parquet).
Apart from codegen, some Parquet optimizations are also not used for CHAR(N).
while they are used for STRING:
- dictionary filtering
- min/max stats (neither row group level stats and the in progress page level
stats)
I think that these should not affect TPC-H/TPC-DS too much.
> Add targeted perf tests for CHAR
> --------------------------------
>
> Key: IMPALA-8402
> URL: https://issues.apache.org/jira/browse/IMPALA-8402
> Project: IMPALA
> Issue Type: Improvement
> Components: Infrastructure
> Reporter: Tim Armstrong
> Priority: Minor
> Labels: perf
>
> This is follow-on from IMPALA-7331. We don't currently have targeted perf
> coverage for CHAR as a result of not having any CHAR-type columns in TPC-H or
> TPC-DS.
> CHAR is not a preferred data tyandpe in Impala because of the "interesting"
> space-padding semantics and limited codegen support but people do use it in
> production. It would be nice to have a way to measure perf regressions or
> take credit for perf wins if/when we enabled codegen for it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]