Alex Behm has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................


Patch Set 9:

(15 comments)

I'm not quite done with reviewing everything, but getting close.

http://gerrit.cloudera.org:8080/#/c/3873/9//COMMIT_MSG
Commit Message:

Line 22: spill additional partitions during probing if the probe partitions 
needed
your new solution is much saner, thank you!


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 206:     Status status = Status::MemLimitExceeded();
let's make the error reporting for PrepareForWrite() and PrepareForRead() more 
similar, e.g., by adding a PREPARE_FOR_WRITE_FAILED_ERROR_MSG and doing the 
same as for the read case


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 165:   if (expr_mem_tracker_ != NULL) {
the function comment makes it sound like we should also have:

if (closed_) return;


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

Line 35
revert?


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 310:   for (unique_ptr<Partition>& partition: all_partitions_)  
partition->Close(NULL);
nit: extra space before 'partition'


Line 464:   // SwitchToIoBuffers() call below will succeed.
Does this comment make sense? Can we use small buffers and have a stream in the 
same partition?


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

Line 46: class PhjBuilder : public DataSink {
PartitionedHashJoinBuilder for consistency?


Line 69:   /// This function walks the current hash partitions and picks one to 
spill.
... and spills the largest one


Line 174:     /// transferring ownership to them.
this line doesn't seem to add anything


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 865: Status 
PartitionedHashJoinNode::BuildHashTablesAndPrepareProbePartitions() {
It would be good to document what exactly is handed off from the PHJ builder to 
the PHJ node (e.g., in the PHJ builder class comment)

To me it would make sense to move the actual HT construction into the builder 
as well to accommodate the in-memory broadcast join scenario (the probe threads 
expect the HT to be built).

I think we should strive to not spill build partitions in the hash join node to 
maintain a clean separation, but I realize this would be a drastic departure 
from the existing code since we re-partition spilled partitions lazily during 
the probe. 

If we had reservations, it seems like we could determine the re-partitioning 
requirements before the probe.


Line 959:       
RETURN_IF_ERROR(builder_->SpillPartition(&spilled_partition_idx));
It feels like there must be a simpler and more efficient way to handle this 
case - we are going to spill the largest partition here, dropping the 
constructed HT.

At the same time it's really an edge case, so probably better to defer. It's 
certainly better than before :)


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 79:   /// Logically, the algorithm has the following modes:.
would be good to document somewhere (not necessarily here) what phases are 
executed where, i.e., in the builder or the join node. seems like we'll end up 
with the build phase finishing the level0 partitioning and HT construction, and 
then the join may need to repartition probe and build partitions


Line 445:     /// Close the partition and attach resources to 'batch' if 
non-NULL or free the
can you make the comment on BuildPartition::Close() similar to this one? this 
new one is much better


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

Line 231:   /// Prepares the stream for reading. Called after Init() and before 
the first AddRow()
maybe expand to:

Prepares the stream for writing by attempting to pin a block to buffer the 
writes.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 400:   AddChild(child, indent);
shouldn't the addition and the swap be atomic under children_lock_?

the interfaces existing interfaces seem to make atomicity difficult (AddChild() 
takes and releases lock)


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to