[
https://issues.apache.org/jira/browse/IMPALA-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743149#comment-16743149
]
Csaba Ringhofer commented on IMPALA-4994:
-----------------------------------------
I have a possible performance concern about CHAR(N): it could make the
dictionary much larger if N is large + need a lot of copying. The current
solution of treating CHAR(N) the same way as STRING during dictionary
construction has the advantage that no copying is necessary, the dictionary
will simply contain a pointer to the buffer with the strings. Meanwhile the
conversion itself is quite cheap (especially compared to TIMESTAMP...). so we
wouldn't win too much by doing it for less elements.
On the other side, doing the conversion during dictionary construction would
enable dictionary filtering, which could be a major speed up for some queries.
> Push conversion and validation into dictionary construction
> -----------------------------------------------------------
>
> Key: IMPALA-4994
> URL: https://issues.apache.org/jira/browse/IMPALA-4994
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.9.0
> Reporter: Joe McDonnell
> Assignee: Csaba Ringhofer
> Priority: Major
> Labels: ramp-up
>
> Certain data types require conversion and/or validation when read from a
> Parquet file. For example, timestamps can require conversion to account for
> different storage offsets. Char/varchar fields can require conversion to
> handle lengths and space padding. Timestamps require validation, because not
> all bit combinations are valid timestamps.
> Right now, this is done per element as it is read. For dictionary encoded
> columns, it would save processing to do the conversion/validation once at
> dictionary construction.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]