[ 
https://issues.apache.org/jira/browse/IMPALA-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814294#comment-16814294
 ] 

Csaba Ringhofer commented on IMPALA-8402:
-----------------------------------------

An idea for perf testing:
A special version of TPC-H could be created where some STRING columns are 
systematically replaced with char(N).
- There are some 1 character flag columns,  e.g. lineitem.l_returnflag. Using 
CHAR(1)  seems like a clear optimization in this case.
- Dates are currently stored as strings with fix length 10, (e.g. 
lineitem.l_shipdate). Reducing the size of slot from 12 to 10 bytes and 
removing the indirection could probably make some queries faster. Note that the 
DATE type is in progress ( https://gerrit.cloudera.org/#/c/12481/ ), and it 
will be a much better way to optimize these columns (4 byte slot, same 
representation as in Parquet).

Apart from codegen, some Parquet optimizations are also not used for CHAR(N). 
while they are used for STRING:
- dictionary filtering
- min/max stats (neither row group level stats and the in progress page level 
stats)

I think that these should not affect TPC-H/TPC-DS too much.

> Add targeted perf tests for CHAR
> --------------------------------
>
>                 Key: IMPALA-8402
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8402
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>            Reporter: Tim Armstrong
>            Priority: Minor
>              Labels: perf
>
> This is follow-on from IMPALA-7331. We don't currently have targeted perf 
> coverage for CHAR as a result of not having any CHAR-type columns in TPC-H or 
> TPC-DS.
> CHAR is not a preferred data tyandpe in Impala because of the "interesting" 
> space-padding semantics and limited codegen support but people do use it in 
> production. It would be nice to have a way to measure perf regressions or 
> take credit for perf wins if/when we enabled codegen for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to