[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Tim Armstrong (Code Review) Thu, 25 Aug 2016 14:38:46 -0700

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................



Patch Set 9:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 206:     Status status = Status::MemLimitExceeded();
> let's make the error reporting for PrepareForWrite() and PrepareForRead() m
Good point, this discrepancy crept  in in a rebase.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 165:   if (expr_mem_tracker_ != NULL) {
> the function comment makes it sound like we should also have:
Done. Makes it more obvious.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

Line 35
> revert?
I just fixed the formatting while I was looking at it, I think the new 
formatting is more normal.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 310:   for (unique_ptr<Partition>& partition: all_partitions_)  
partition->Close(NULL);
> nit: extra space before 'partition'
Done


Line 464:   // SwitchToIoBuffers() call below will succeed.
> Does this comment make sense? Can we use small buffers and have a stream in
Yes (aside from SwitchToIoBuffers() being a weird API). The build_stream() 
started off with small buffers so might not have switched to I/O buffers at 
this point.

I think it's unlikely that we'll spill a partition that hadn't switched to I/O 
buffers but I think it's possible.

This will go away with the new buffer pool.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

Line 46: class PhjBuilder : public DataSink {
> PartitionedHashJoinBuilder for consistency?
It's for consistency with NljBuilder (Marcel suggested that name on the 
previous review).


Line 69:   /// This function walks the current hash partitions and picks one to 
spill.
> ... and spills the largest one
Done, although made it clear that it's a policy choice.


Line 174:     /// transferring ownership to them.
> this line doesn't seem to add anything
Done


Line 189:     /// all blocks aside from the current read or write block are 
left pinned.
This comment was wrong - fixed it.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 865: Status 
PartitionedHashJoinNode::BuildHashTablesAndPrepareProbePartitions() {
> It would be good to document what exactly is handed off from the PHJ builde
I agree that should be the end-goal.

I think for now it's tricky to do better in the spilling case than this 
approach of allocating probe buffers, then hash tables, then remaining probe 
buffers until we have reservations.

When we need to implement the in-memory broadcast join, we could add a 
temporary in-memory codepath that did the hash table build purely in the 
builder, but just bailed out if any partitions spilled.

One stopgap option (which is feasible but ugly) that lets us implement the 
current algorithm in the builder is to allocate the initial probe Blocks in the 
builder, and then hand them over to the join. PrepareForWrite() could then 
optionally take a Block as an argument. What do you think of doing that?


Line 891:   for (int i = 0; i < PARTITION_FANOUT; ++i) {
I moved the hash table build calls into the builder, as an incremental step 
towards separating this more cleanly.


Line 959:       
RETURN_IF_ERROR(builder_->SpillPartition(&spilled_partition_idx));
> It feels like there must be a simpler and more efficient way to handle this
Yeah, if we know how much memory we have to work with we can precompute the 
sizes of the hash tables and do some kind of bin-packing. I don't think this is 
feasible until we have reservations.


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 79:   /// Logically, the algorithm has the following modes:.
> would be good to document somewhere (not necessarily here) what phases are 
I added a comment to the builder class to better explain the responsibilities 
of the builder.


Line 150:   bool AppendProbeRow(BufferedTupleStream* stream, TupleRow* row, 
Status* status);
I changed this to just return a Status, since there's no situation where it can 
return false but have ok status.


Line 445:     /// Close the partition and attach resources to 'batch' if 
non-NULL or free the
> can you make the comment on BuildPartition::Close() similar to this one? th
Done


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

Line 231:   /// Prepares the stream for reading. Called after Init() and before 
the first AddRow()
> maybe expand to:
Done


http://gerrit.cloudera.org:8080/#/c/3873/9/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 400:   AddChild(child, indent);
> shouldn't the addition and the swap be atomic under children_lock_?
Good point. I added an internal AddChildLocked() method and made sure to grab 
children_lock_ for the duration of this method.


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Reply via email to