Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3905: Add single-threaded scan node. ......................................................................
Patch Set 6: (22 comments) http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 287: if (state->query_options().mt_num_cores > 1) { i thought we wanted to have this be >= 1 (and change the default to 0) http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 154: Status HdfsScanNodeBase::Prepare(RuntimeState* state) { this is insanely long. leave todo: break this up. Line 373: for (auto& filter: filter_ctxs_) RETURN_IF_ERROR(filter.expr->Open(state)); don't use auto here Line 682: /// This function may be called from multiple threads, and we expect each this also sounds like it's specific to the old scan node. Line 763: lock_guard<SpinLock> l(file_type_counts_lock_); this seems specific to having multiple scanner threads. it would be good to outline the eventual clean-up, when scanner threads go away. either with a central todo a the top (of the .h file) that lists everything, or with todos here. http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 95: /// into a roe batch queue (via HdfsScanner::ProcessSplit()). Those specifics RowBatch Line 125: virtual bool IsMultiThreaded() const = 0; can we pick a different name for that? i'm worried about causing confusion (with other mt-related things, and this is specifically not relevant in the mt path). something related to using a rowbatch queue? Line 216: /// Called by scanners when a range is complete. Used to trigger done_ and done_ has migrated into the old scanner, and you explicitly reference a rowbatch queue. update/reword comment. Line 242: typedef boost::unordered_map<int32_t, std::pair<int, int64_t>> PerVolumnStats; std:: Line 266: const std::vector<FilterContext> filter_ctxs() const { return filter_ctxs_; } const& ? Line 304: boost::unordered_set<int64_t> partition_ids_; std:: Line 405: SpinLock file_type_counts_lock_; isn't this specific to having multiple scanner threads? Line 447: /// This must be called on Close() to unregister counters. mention synchronization requirement? http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-mt.cc File be/src/exec/hdfs-scan-node-mt.cc: Line 73: // TODO: This is probably not worth splitting the organisational cost of splitting intentional duplication? Line 74: // initialisation across two places. Move to before the scanner threads start. reference to scanner threads Line 84: RETURN_IF_ERROR(runtime_state_->io_mgr()->GetNextRange( single line? if not, break before call/after macro? http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 716 where did this go? Line 92: RETURN_IF_ERROR(IssueInitialScanRanges(state)); much better http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: Line 44: /// HDFS-serialised data. mention that this is not used on the non-mt path. Line 61: /// recycled. todo: remove once mt is fully functional http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: Line 116: /// including the final one in Close(). this comment seems unrelated to the c'tor. http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: Line 299: /// Always returns false if the scan_node_ is not multi-threaded. this also deserves a clean-up todo. -- To view, visit http://gerrit.cloudera.org:8080/4113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
