Alex Behm has submitted this change and it was merged. Change subject: IMPALA-2736: Optimized ReadValueBatch() for Parquet scalar column readers. ......................................................................
IMPALA-2736: Optimized ReadValueBatch() for Parquet scalar column readers. This change builds on top of the recent move to column-wise materialization of scalar values in the Parquet scanner. The goal of this patch is to improve the scan efficiency, and show the future direction for all column readers. Major TODO: The current patch has minor code duplication/redundancy, and the new ReadValueBatch() departs from (but improves) the existing column reader control flow. To improve code reuse and readability we should overhaul all column readers to be more uniform. Summary of changes: - refactor ReadValueBatch() to simplify control flow - introduce caching of def/rep levels for faster level decoding, and for a tigher value materialization loop - new templated function for value materialization that takes the value encoding as a template argument Mini benchmark vs. cdh5-trunk I ran the following queries on a single impalad before and after my change using a synthetic 'huge_lineitem' table. I modified hdfs-scan-node.cc to set the number of rows of any row batch to 0 to focus the measurement on the scan time. Query options: set num_scanner_threads=1; set disable_codegen=true; set num_nodes=1; select * from huge_lineitem; Before: 22.39s Afer: 13.62s select * from huge_lineitem where l_linenumber < 0; Before: 25.11s After: 17.73s select * from huge_lineitem where l_linenumber % 2 = 0; Before: 26.32s After: 16.68s select l_linenumber from huge_lineitem; Before: 1.74s After: 0.92s Testing: I ran a private exhaustive build and all tests passed. Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30 Reviewed-on: http://gerrit.cloudera.org:8080/2843 Reviewed-by: Alex Behm <[email protected]> Tested-by: Alex Behm <[email protected]> --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/rle-encoding.h 3 files changed, 356 insertions(+), 133 deletions(-) Approvals: Alex Behm: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/2843 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30 Gerrit-PatchSet: 11 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
