[jira] [Commented] (IMPALA-9873) Skip decoding of non-materialised columns in Parquet

Amogh Margoor (Jira) Wed, 28 Jul 2021 08:07:05 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388833#comment-17388833
 ]


Amogh Margoor commented on IMPALA-9873:
---------------------------------------

https://docs.google.com/document/d/1QFu_Zu9nHuMpu5Pqb3qe62MbZPA88j_o7NtpZ2a2zSA/edit?usp=sharing.

> Skip decoding of non-materialised columns in Parquet
> ----------------------------------------------------
>
>                 Key: IMPALA-9873
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9873
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Amogh Margoor
>            Priority: Major
>
> This is a first milestone for lazy materialization in parquet, focusing on 
> avoiding decompression and decoding of columns.
> * Identify columns referenced by predicates and runtime row filters and 
> determine what order the columns need to be materialised in. Probably we want 
> to evaluate static predicates before runtime filters to match current 
> behaviour.
> * Rework this loop so that it alternates between materialising columns and 
> evaluating predicates: 
> https://github.com/apache/impala/blob/052129c/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1110
> * We probably need to keep track of filtered rows using a new data structure, 
> e.g. bitmap
> * We need to then check that bitmap at each step to see if we skip 
> materialising part or all of the following columns. E.g. if the first N rows 
> were pruned, we can skip forward the remaining readers N rows.
> * This part may be a little tricky - there is the risk of adding overhead 
> compared to the current code.
> * It is probably OK to just materialise the partition columns to start off 
> with - avoiding materialising those is not going to buy that much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9873) Skip decoding of non-materialised columns in Parquet

Reply via email to