[
https://issues.apache.org/jira/browse/IMPALA-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438143#comment-17438143
]
ASF subversion and git services commented on IMPALA-9873:
---------------------------------------------------------
Commit ef2a8f6f57c8feb11197d2e632e18e65e05cc4ab in impala's branch
refs/heads/master from Amogh Margoor
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ef2a8f6 ]
IMPALA-9873: (addendum) Fix test case for scratch_tuple_batch
Patch contains minor fixes:
1. scratch_tuple_batch test which was causing failure in ASAN
build (IMPALA-10998).
2. Removing DCHECK which is not needed and gets triggered on
cancellation tests (IMPALA-11000).
Change-Id: I74ee41718745b8dca26f88082d3f2efe474e3bf9
Reviewed-on: http://gerrit.cloudera.org:8080/17992
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Skip decoding of non-materialised columns in Parquet
> ----------------------------------------------------
>
> Key: IMPALA-9873
> URL: https://issues.apache.org/jira/browse/IMPALA-9873
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Reporter: Tim Armstrong
> Assignee: Amogh Margoor
> Priority: Major
>
> This is a first milestone for lazy materialization in parquet, focusing on
> avoiding decompression and decoding of columns.
> * Identify columns referenced by predicates and runtime row filters and
> determine what order the columns need to be materialised in. Probably we want
> to evaluate static predicates before runtime filters to match current
> behaviour.
> * Rework this loop so that it alternates between materialising columns and
> evaluating predicates:
> https://github.com/apache/impala/blob/052129c/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1110
> * We probably need to keep track of filtered rows using a new data structure,
> e.g. bitmap
> * We need to then check that bitmap at each step to see if we skip
> materialising part or all of the following columns. E.g. if the first N rows
> were pruned, we can skip forward the remaining readers N rows.
> * This part may be a little tricky - there is the risk of adding overhead
> compared to the current code.
> * It is probably OK to just materialise the partition columns to start off
> with - avoiding materialising those is not going to buy that much.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]