Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/2779
Change subject: PREVIEW: Basic column-wise slot materialization in Parquet scanner. ...................................................................... PREVIEW: Basic column-wise slot materialization in Parquet scanner. This change is a first step towards a more efficient Parquet scanner. The focus is on presenting the new code flow that materializes the table-level slots in a column-wise fashion, without going deep into actually improving scan efficieny. After these changes there are several obvious places that should be optimized to realize efficiency gains. Summary of changes - the table-level tuples are materialized in a column-wise fashion with new ColumnReader::ReadValueBatch() functions - this is done by materializing a 'scratch' batch, and transferring scratch tuples that survive filters/conjuncts to the output batch - the tuples of nested collections are still materialized in a row-wise fashion using the ColumnReader::ReadValue() function, just as before Mini benchmark I ran the following queries on a single impalad before and after my change using a synthetic 'huge_lineitem' table. I modified hdfs-scan-node.cc to set the number of rows of any row batch to 0 to focus the measurement on the scan time. Query options: set num_scanner_threads=1; set disable_codegen=true; set num_nodes=1; select * from huge_lineitem; Before: 22.39s Afer: 18.50s select * from huge_lineitem where l_linenumber < 0; Before: 25.11s After: 20.56s select * from huge_lineitem where l_linenumber % 2 = 0; Before: 26.32s After: 21.82s Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h 2 files changed, 373 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/79/2779/1 -- To view, visit http://gerrit.cloudera.org:8080/2779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]>
