Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides 
DataSinks
......................................................................


Patch Set 7:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/blocking-join-node.h
File be/src/exec/blocking-join-node.h:

Line 115:   template <bool ASYNC_BUILD>
> why does this need to be a template?
It's to avoid the overhead of timers that are only used for async builds. See 
IMPALA-2407 for some of the issues they caused. Updated comment.


http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 40: /// Superclass of all data sinks.
> i think it's time to provide this with a real class comment (what is the ab
I added a short description. I feel like the second paragraph is redundant with 
the method comments but I'll let you decide if you think it's useful (I 
typically read method comments before class comments).


Line 47:   virtual std::string Name() = 0;
> GetName(), otherwise it sounds like a getter
Done


Line 51:   virtual Status Prepare(RuntimeState* state, MemTracker* mem_tracker);
> explain param; also, why does it get created outside this class? possibly e
Just for flexibility. We don't want to add another layer of memtrackers in join 
nodes by giving each sink its own memtracker.


Line 59:   virtual Status NextBatchToSend(RowBatch** batch);
> i don't understand the desired control flow, please explain in class commen
I added to the class comment, this comment and the Send() comment also mention 
the relationship between the two methods.

I'm not in love with this interface but nested types is sensitive to the 
overhead of creating these batches so I wanted to avoid making perf-sensitive 
changes in this patch.

I had trouble finding a alternative way to deal with this that wouldn't require 
larger-scale changes.


http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 131:   /// Prepares output_exprs and partition_key_exprs, and connects to 
HDFS.
> leave blank line between function declaration and the comment for the next 
Done


Line 142:   /// TODO: Move calls to functions that can fail in Close() to 
FlushFinal()
> is this todo still appropriate?
It's IMPALA-2988


http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/nested-loop-join-builder.h
File be/src/exec/nested-loop-join-builder.h:

Line 25: /// Builder for the NestedLoopJoinNode operator.
> explain builder (or use different term; you're basically just repeating the
Done.


Line 32: class NestedLoopJoinBuilder : public DataSink {
> fine to abbreviate to Nlj
Done


Line 49:   /// Returns the next batch from 'build_batch_cache_'
> don't reference internal state
Done.


Line 50:   inline RowBatch* GetNextBatchForBuild() { return 
build_batch_cache_.GetNextBatch(); }
> this is going beyond DataSink functionality, and should be described in the
I extended the method comments to explain the context.


http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/exec/nested-loop-join-node.h
File be/src/exec/nested-loop-join-node.h:

Line 35: /// null-aware left anti-join.
> missing reference to builder class
Done.

I'd personally lean towards not cross-referencing classes like this. Seems 
likely to get out of sync with the code over time. It doesn't bother me since I 
don't tend to read these comments.


http://gerrit.cloudera.org:8080/#/c/3282/7/be/src/runtime/plan-fragment-executor.h
File be/src/runtime/plan-fragment-executor.h:

Line 196:   boost::scoped_ptr<RowBatch> row_batch_;
> separate sink-related items visually w/ blank line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to