Tim Armstrong created IMPALA-9873:
-------------------------------------

             Summary: Skip decoding of non-materialised columns in Parquet
                 Key: IMPALA-9873
                 URL: https://issues.apache.org/jira/browse/IMPALA-9873
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
            Reporter: Tim Armstrong


This is a first milestone for lazy materialization in parquet, focusing on 
avoiding decompression and decoding of columns.

* Identify columns referenced by predicates and runtime row filters and 
determine what order the columns need to be materialised in. Probably we want 
to evaluate static predicates before runtime filters to match current behaviour.
* Rework this loop so that it alternates between materialising columns and 
evaluating predicates: 
https://github.com/apache/impala/blob/052129c/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1110
* We probably need to keep track of filtered rows using a new data structure, 
e.g. bitmap
* We need to then check that bitmap at each step to see if we skip 
materialising part or all of the following columns. E.g. if the first N rows 
were pruned, we can skip forward the remaining readers N rows.
* This part may be a little tricky - there is the risk of adding overhead 
compared to the current code.
* It is probably OK to just materialise the partition columns to start off with 
- avoiding materialising those is not going to buy that much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to