Alex Behm has submitted this change and it was merged. Change subject: IMPALA-3905: Add single-threaded scan node. ......................................................................
IMPALA-3905: Add single-threaded scan node. Adds a new single-threaded scan node HdfsScanNodeMt that materializes tuples in the thread calling GetNext(). The new scan node uses the HdfsScanner::GetNext() interface, which is currently only implemented for Parquet. As before, I/O is performed asynchronously via the I/O manager. The new scan node is enabled if the mt_dop query option is set to a value greater than 1. Otherwise, the existing multi-threaded scan node is used. The changes are mostly a refactoring of the existing multi-threaded scan node to separate out the common code between the existing multi-threaded scan node and the new single-threaded one. Summary of changes: - Move code from hdfs-scan-node.h/cc into a new hdfs-scan-node-base.h/cc - Add new single-threaded scan node in hdfs-scan-node-mt.h/cc - Both scan nodes inherit from HdfsScanNodeBase - Rework the allocation of templates tuples such that the memory is drawn from a new mem pool in the scanners, and that each scanner clones the partition exprs contexts. Before, the memory was taken from the parent scan node's mem pool, and there was only one instance of the partition exprs contexts. Their access was protected under a lock, however, not in all instances, so their use was not always obviously correct. The change in this patch makes thread safety obvious and helps move a lock into the multi-threaded scan node which would otherwise have to remain in the HdfsScanNodeBase class. - Simplify a couple of loops with C++11 for-each Testing: A private core/hdfs run passed. I ran TPC-H/DS and test_scanners.py on ASAN several times locally. Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0 Reviewed-on: http://gerrit.cloudera.org:8080/4113 Reviewed-by: Alex Behm <[email protected]> Tested-by: Alex Behm <[email protected]> --- M be/src/exec/CMakeLists.txt M be/src/exec/base-sequence-scanner.cc M be/src/exec/base-sequence-scanner.h M be/src/exec/exec-node.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-lzo-text-scanner.cc M be/src/exec/hdfs-lzo-text-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-rcfile-scanner.h A be/src/exec/hdfs-scan-node-base.cc A be/src/exec/hdfs-scan-node-base.h A be/src/exec/hdfs-scan-node-mt.cc A be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exprs/expr-context.h M be/src/runtime/tuple.h M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/com/cloudera/impala/service/Frontend.java 31 files changed, 1,837 insertions(+), 1,474 deletions(-) Approvals: Alex Behm: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/4113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
