Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks ......................................................................
Patch Set 7: (13 comments) http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/blocking-join-node.h File be/src/exec/blocking-join-node.h: Line 115: template <bool ASYNC_BUILD> > why does this need to be a template? It's to avoid the overhead of timers that are only used for async builds. See IMPALA-2407 for some of the issues they caused. Updated comment. http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 40: /// Superclass of all data sinks. > i think it's time to provide this with a real class comment (what is the ab I added a short description. I feel like the second paragraph is redundant with the method comments but I'll let you decide if you think it's useful (I typically read method comments before class comments). Line 47: virtual std::string Name() = 0; > GetName(), otherwise it sounds like a getter Done Line 51: virtual Status Prepare(RuntimeState* state, MemTracker* mem_tracker); > explain param; also, why does it get created outside this class? possibly e Just for flexibility. We don't want to add another layer of memtrackers in join nodes by giving each sink its own memtracker. Line 59: virtual Status NextBatchToSend(RowBatch** batch); > i don't understand the desired control flow, please explain in class commen I added to the class comment, this comment and the Send() comment also mention the relationship between the two methods. I'm not in love with this interface but nested types is sensitive to the overhead of creating these batches so I wanted to avoid making perf-sensitive changes in this patch. I had trouble finding a alternative way to deal with this that wouldn't require larger-scale changes. http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: Line 131: /// Prepares output_exprs and partition_key_exprs, and connects to HDFS. > leave blank line between function declaration and the comment for the next Done Line 142: /// TODO: Move calls to functions that can fail in Close() to FlushFinal() > is this todo still appropriate? It's IMPALA-2988 http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/nested-loop-join-builder.h File be/src/exec/nested-loop-join-builder.h: Line 25: /// Builder for the NestedLoopJoinNode operator. > explain builder (or use different term; you're basically just repeating the Done. Line 32: class NestedLoopJoinBuilder : public DataSink { > fine to abbreviate to Nlj Done Line 49: /// Returns the next batch from 'build_batch_cache_' > don't reference internal state Done. Line 50: inline RowBatch* GetNextBatchForBuild() { return build_batch_cache_.GetNextBatch(); } > this is going beyond DataSink functionality, and should be described in the I extended the method comments to explain the context. http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/nested-loop-join-node.h File be/src/exec/nested-loop-join-node.h: Line 35: /// null-aware left anti-join. > missing reference to builder class Done. I'd personally lean towards not cross-referencing classes like this. Seems likely to get out of sync with the code over time. It doesn't bother me since I don't tend to read these comments. http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/runtime/plan-fragment-executor.h File be/src/runtime/plan-fragment-executor.h: Line 196: boost::scoped_ptr<RowBatch> row_batch_; > separate sink-related items visually w/ blank line Done -- To view, visit http://gerrit.cloudera.org:8080/3282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
