Alex Behm has uploaded a new patch set (#2). Change subject: IMPALA-3905: Add HdfsScanner::GetNext() interface and implementation for Parquet. ......................................................................
IMPALA-3905: Add HdfsScanner::GetNext() interface and implementation for Parquet. This is a first step towards making our scan node single threaded since we are moving to an execution model where multi-threading is done at the fragment level. This patch adds a new synchronous HdfsScanner::GetNext() interface and implements it for the Parquet scanner. The async execution via HdfsScanner::ProcessSplit() is still supported and is implemented by repeatedly calling GetNext() for code sharing purposes. I did not yet add a single-threaded scan node that uses GetNext(). The new code will be excercised by the existing scan node and tests. Testing: I locally ran the scanner tests and TPCDS tests on core. I am still in the process of validating the performance of the existing multi-threaded scan node. Change-Id: Iab50770bac05afcda4d3404fb4f53a0104931eb0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.h M common/thrift/ImpalaInternalService.thrift 10 files changed, 359 insertions(+), 259 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/32/3732/2 -- To view, visit http://gerrit.cloudera.org:8080/3732 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab50770bac05afcda4d3404fb4f53a0104931eb0 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]>
