[jira] [Commented] (IMPALA-9747) More fine-grained codegen for text file scanners

Tim Armstrong (Jira) Thu, 21 May 2020 09:18:08 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113330#comment-17113330
 ]


Tim Armstrong commented on IMPALA-9747:
---------------------------------------

[~daniel.becker] yeah you can definitely do it. The easiest way is probably to 
add a wrapper function in the cross-compiled IR and call that - that lets you 
use existing infrastructure and makes sure you've got the function signature, 
name mangling, etc right.

Otherwise you can construct a function prototype matching that in the library, 
add it to the IR and call that. 
https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/basic-constructs/functions.html#function-prototypes
 has some examples of declare vs. define in the IR>

When the codegen module is compiled, first it tries to link function calls to 
functions in the IR module. Then if it's not there it does a dlsym to find the 
symbol in the current process, which will find any exported symbols in the 
impalad binary.

> More fine-grained codegen for text file scanners
> ------------------------------------------------
>
>                 Key: IMPALA-9747
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9747
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Daniel Becker
>            Priority: Major
>
> Currently if  the materialization of any column cannot be codegend for some 
> reason (e.g. it is CHAR(N)), then the whole codegen is cancelled for the text 
> scanner, see:
> https://github.com/apache/impala/blob/b5805de3e65fd1c7154e4169b323bb38ddc54f4f/be/src/exec/text-converter.cc#L112
> https://github.com/apache/impala/blob/58273fff601dcc763ac43f7cc275a174a2e18b6b/be/src/exec/hdfs-scanner.cc#L342
> It would be much better to use the non-codegend path only for the problematic 
> columns and use the codegend materialization for the rest + always do 
> conjunct  evaluation with codegen.
> The codegend path orders slots based on the conjuncts that use them and 
> evaluates conjuncts when the slots it need becomes available, so if the row 
> is dropped then the rest of the slots do not need to be materialized. A 
> simple solution would be to always do non-codegend slot materialization first 
> so that they are ready if a conjunct needs them. Moving the columns that are 
> not used by conjuncts to the end could be a further optimization.
> This came up during the materialization of BINARY columns, which needs  
> base64 decoding during materialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9747) More fine-grained codegen for text file scanners

Reply via email to