[jira] [Commented] (IMPALA-9873) Skip decoding of non-materialised columns in Parquet

ASF subversion and git services (Jira) Wed, 03 Nov 2021 08:33:07 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438143#comment-17438143
 ]


ASF subversion and git services commented on IMPALA-9873:
---------------------------------------------------------

Commit ef2a8f6f57c8feb11197d2e632e18e65e05cc4ab in impala's branch 
refs/heads/master from Amogh Margoor
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ef2a8f6 ]

IMPALA-9873: (addendum) Fix test case for scratch_tuple_batch

Patch contains minor fixes:
1. scratch_tuple_batch test which was causing failure in ASAN
   build (IMPALA-10998).
2. Removing DCHECK which is not needed and gets triggered on
   cancellation tests (IMPALA-11000).

Change-Id: I74ee41718745b8dca26f88082d3f2efe474e3bf9
Reviewed-on: http://gerrit.cloudera.org:8080/17992
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Skip decoding of non-materialised columns in Parquet
> ----------------------------------------------------
>
>                 Key: IMPALA-9873
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9873
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Amogh Margoor
>            Priority: Major
>
> This is a first milestone for lazy materialization in parquet, focusing on 
> avoiding decompression and decoding of columns.
> * Identify columns referenced by predicates and runtime row filters and 
> determine what order the columns need to be materialised in. Probably we want 
> to evaluate static predicates before runtime filters to match current 
> behaviour.
> * Rework this loop so that it alternates between materialising columns and 
> evaluating predicates: 
> https://github.com/apache/impala/blob/052129c/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1110
> * We probably need to keep track of filtered rows using a new data structure, 
> e.g. bitmap
> * We need to then check that bitmap at each step to see if we skip 
> materialising part or all of the following columns. E.g. if the first N rows 
> were pruned, we can skip forward the remaining readers N rows.
> * This part may be a little tricky - there is the risk of adding overhead 
> compared to the current code.
> * It is probably OK to just materialise the partition columns to start off 
> with - avoiding materialising those is not going to buy that much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9873) Skip decoding of non-materialised columns in Parquet

Reply via email to