[
https://issues.apache.org/jira/browse/IMPALA-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113323#comment-17113323
]
Csaba Ringhofer commented on IMPALA-9747:
-----------------------------------------
[~tarmstrong] Apart from CHAR(N) other string types are also problematic if
unescaping is needed, and BINARY will add another case when codegen doesn't
seem a good idea to me - it needs base64 decoding, which is currently done
using an external function sasl_decode64().
My opinion is that for strings that require something complex for
materialization it is better to just call a function than trying to write
complex handcrafted codegened code.
> More fine-grained codegen for text file scanners
> ------------------------------------------------
>
> Key: IMPALA-9747
> URL: https://issues.apache.org/jira/browse/IMPALA-9747
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Csaba Ringhofer
> Assignee: Daniel Becker
> Priority: Major
>
> Currently if the materialization of any column cannot be codegend for some
> reason (e.g. it is CHAR(N)), then the whole codegen is cancelled for the text
> scanner, see:
> https://github.com/apache/impala/blob/b5805de3e65fd1c7154e4169b323bb38ddc54f4f/be/src/exec/text-converter.cc#L112
> https://github.com/apache/impala/blob/58273fff601dcc763ac43f7cc275a174a2e18b6b/be/src/exec/hdfs-scanner.cc#L342
> It would be much better to use the non-codegend path only for the problematic
> columns and use the codegend materialization for the rest + always do
> conjunct evaluation with codegen.
> The codegend path orders slots based on the conjuncts that use them and
> evaluates conjuncts when the slots it need becomes available, so if the row
> is dropped then the rest of the slots do not need to be materialized. A
> simple solution would be to always do non-codegend slot materialization first
> so that they are ready if a conjunct needs them. Moving the columns that are
> not used by conjuncts to the end could be a further optimization.
> This came up during the materialization of BINARY columns, which needs
> base64 decoding during materialization.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]