[ 
https://issues.apache.org/jira/browse/IMPALA-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743149#comment-16743149
 ] 

Csaba Ringhofer commented on IMPALA-4994:
-----------------------------------------

I have a possible performance concern about CHAR(N): it could make the 
dictionary much larger if N is large + need a lot of copying. The current 
solution of treating CHAR(N) the same way as STRING during dictionary 
construction has the advantage that no copying is necessary, the dictionary 
will simply contain a pointer to the buffer with the strings. Meanwhile the 
conversion itself is quite cheap (especially compared to TIMESTAMP...). so we 
wouldn't win too much by doing it for less elements.

On the other side, doing the conversion during dictionary construction would 
enable dictionary filtering, which could be a major speed up for some queries.

> Push conversion and validation into dictionary construction
> -----------------------------------------------------------
>
>                 Key: IMPALA-4994
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4994
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Joe McDonnell
>            Assignee: Csaba Ringhofer
>            Priority: Major
>              Labels: ramp-up
>
> Certain data types require conversion and/or validation when read from a 
> Parquet file. For example, timestamps can require conversion to account for 
> different storage offsets. Char/varchar fields can require conversion to 
> handle lengths and space padding. Timestamps require validation, because not 
> all bit combinations are valid timestamps.
> Right now, this is done per element as it is read. For dictionary encoded 
> columns, it would save processing to do the conversion/validation once at 
> dictionary construction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to