Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides 
DataSinks
......................................................................


Patch Set 2:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

Line 195:           &build_side_status));
> indent 2 more
Done.


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/blocking-join-node.h
File be/src/exec/blocking-join-node.h:

Line 67:   boost::scoped_ptr<RowBatch> build_batch_;
> is this only utilized when the join runs in a particular 'mode' (i don't me
The two changes are unrelated. The build_pool_ was only used by the legacy 
HashJoinNode, so I moved it there. The build_batch_ is added as an optimisation 
for Reset() to avoid reallocating the batch.


Line 122:   /// Set up 'current_probe_row_' to point to the first input row 
from the left child
> you mean from probe_batch_? i don't see anything in this function that rest
Right, my goal here was to change the control flow of the join nodes only to 
the extent that I needed to add the join sink. In practice the join nodes just 
need to call this at the end of Open() to prime themselves for processing the 
probe input.

It's hard to abstract over the global state in the join node without this 
turning into even more of a rewrite of the three join nodes so it seemed better 
to document the postconditions. E.g. I remember Michael looked at consolidating 
probe_batch_pos_ and current_probe_row_ and even that was somewhat non-trivial.

It does make some assumptions about the state of the node that aren't going to 
be true if it's partway through probing. E.g. it assumes that probe_batch_ is 
empty and that probe_side_eos_ is false. So I wanted to name it in a way that 
made it clear that you should only call it at the start. I added DCHECKs for a 
couple of those assumptions.


Line 202:   void ProcessBuildInputAsync(RuntimeState* state, DataSink* 
build_sink,
> find a verb for this? ConstructBuildSide?
ProcessBuildInputAsync()


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 108: 
> simply 'not owned' is fine (unless there's some special dependency that thi
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 631:     PartitionPair* dummy;
> call this dummy or something to that effect, to signal that it's not needed
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 194:       PartitionPair** partition_pair, bool no_more_rows);
> odd indentation
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

Line 31:     MemTracker* mem_tracker) :
> move initializer to new line
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/nested-loop-join-builder.h
File be/src/exec/nested-loop-join-builder.h:

Line 35: /// tuple memory, the non-copying mode is used and row batches are 
simply accumulated in
> meaning that you have to call AddBuildBatch and can't call Send?
It switches depending on whether the row batches have the 'need_to_return' flag 
set. I tried to see if there were any reasonable DCHECKs I could add to enforce 
this further but there wasn't anything straightforward.


Line 90:   /// memory that is owned by the child node.
> the meaning of 'raw' wasn't clear to me until after i read the .cc (we typi
Extended comment to clarify.


Line 92: 
> "deep copied out of raw_build_batches_"
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 247:   for (const FilterContext& filter: filters_) 
RETURN_IF_ERROR(filter.expr->Open(state));
> single line if it fits
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/row-batch-cache.h
File be/src/exec/row-batch-cache.h:

Line 70:   /// GetNextBatch() has not yet been called.
> has not yet been called
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 65: PlanFragmentExecutor::PlanFragmentExecutor(ExecEnv* exec_env,
> that seems generally useful. move to RowBatch?
Done


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

Line 37: 
> ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to