Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3905: Add single-threaded scan node.
......................................................................


Patch Set 6:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 287:       if (state->query_options().mt_num_cores > 1) {
i thought we wanted to have this be >= 1 (and change the default to 0)


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 154: Status HdfsScanNodeBase::Prepare(RuntimeState* state) {
this is insanely long. leave todo: break this up.


Line 373:   for (auto& filter: filter_ctxs_) 
RETURN_IF_ERROR(filter.expr->Open(state));
don't use auto here


Line 682:     /// This function may be called from multiple threads, and we 
expect each
this also sounds like it's specific to the old scan node.


Line 763:     lock_guard<SpinLock> l(file_type_counts_lock_);
this seems specific to having multiple scanner threads.

it would be good to outline the eventual clean-up, when scanner threads go 
away. either with a central todo a the top (of the .h file) that lists 
everything, or with todos here.


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 95: /// into a roe batch queue (via HdfsScanner::ProcessSplit()). Those 
specifics
RowBatch


Line 125:   virtual bool IsMultiThreaded() const = 0;
can we pick a different name for that? i'm worried about causing confusion 
(with other mt-related things, and this is specifically not relevant in the mt 
path).

something related to using a rowbatch queue?


Line 216:   /// Called by scanners when a range is complete.  Used to trigger 
done_ and
done_ has migrated into the old scanner, and you explicitly reference a 
rowbatch queue. update/reword comment.


Line 242:   typedef boost::unordered_map<int32_t, std::pair<int, int64_t>> 
PerVolumnStats;
std::


Line 266:   const std::vector<FilterContext> filter_ctxs() const { return 
filter_ctxs_; }
const& ?


Line 304:   boost::unordered_set<int64_t> partition_ids_;
std::


Line 405:   SpinLock file_type_counts_lock_;
isn't this specific to having multiple scanner threads?


Line 447:   /// This must be called on Close() to unregister counters.
mention synchronization requirement?


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-mt.cc
File be/src/exec/hdfs-scan-node-mt.cc:

Line 73:     // TODO: This is probably not worth splitting the organisational 
cost of splitting
intentional duplication?


Line 74:     // initialisation across two places. Move to before the scanner 
threads start.
reference to scanner threads


Line 84:     RETURN_IF_ERROR(runtime_state_->io_mgr()->GetNextRange(
single line? if not, break before call/after macro?


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 716
where did this go?


Line 92:     RETURN_IF_ERROR(IssueInitialScanRanges(state));
much better


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

Line 44: /// HDFS-serialised data.
mention that this is not used on the non-mt path.


Line 61: /// recycled.
todo: remove once mt is fully functional


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

Line 116:   /// including the final one in Close().
this comment seems unrelated to the c'tor.


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

Line 299:   /// Always returns false if the scan_node_ is not multi-threaded.
this also deserves a clean-up todo.


-- 
To view, visit http://gerrit.cloudera.org:8080/4113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to