Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks ......................................................................
Patch Set 11: (37 comments) Addressed most of the comments. There is still an unresolved question around whether the DataSink interface should allocate row batches. I spoke to Marcel about it and we didn't reach a conclusion about which solution would be better, so I've left that unchanged in this patch but cleaned up the other issues. http://gerrit.cloudera.org:8080/#/c/3282/8/be/src/exec/blocking-join-node.h File be/src/exec/blocking-join-node.h: Line 23: #include "exec/data-sink.h" > forward declare DataSink This became necessary in a later iteration of the patch, since we declare a subclass of DataSink in the header. Line 98: /// Prepare() work, e.g. codegen. > Nit, but I don't think we construct the "build side". The build side is an Done. Renamed ConstructBuildSide to ConstructJoinBuild Line 104: /// Subclasses should close any other structures and then call > ... the Open() and build ... Not quite sure what you were suggesting, tried to make the wording a bit better. Line 105: /// BlockingJoinNode::Close(). > permit (plural) Done Line 106: virtual void Close(RuntimeState* state); > Suggest: If the build_sink is non-NULL, uses it to construct the join build Done PS8, Line 109: Changed to JoinBuildSink to be more accurate. Line 112: TJoinOp::type join_op_; > ... to construct the join build via a data sink. Done Line 118: > left child (probe side). Done Line 123: TupleRow* current_probe_row_; // The row currently being probed > typo: extra '.' Done Line 196: > nit: I don't think we use the DataSink interface to construct the join buil Done http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/blocking-join-node.h File be/src/exec/blocking-join-node.h: Line 42: /// BlockingJoinNode. Reset() is added to support subplans and more efficient allocation > remove "is added" Done Line 43: /// of RowBatch objects is enabled with the GetNextBatchForBuild() interface. > remove "is enabled" Reworded to avoid it. Line 51: /// Frees resources associated with cache row batches. > cached row batches Done Line 52: /// Subclasses must call JoinBuildSink::Close(). > is the order of Close() calls important? If so, mention which Close() shoul Not really, but documented it for consistency. Line 55: /// Returns the next build batch that should be filled and passed to AddBuildBatch() > Returns an empty build batch that should be filled and passed to Send(). Done Line 56: /// or Send(). For join types that accumulates batches, this helps avoid additional > accumulate Done Line 60: inline RowBatch* GetNextBatchForBuild() { return build_batch_cache_.GetNextBatch(); } > GetEmptyBatch() or GetNextEmptyBatch()? The important part seems to be that Renamed to GetNextEmptyBatch(). Still looking at the alternative with the DataSink interface and going back-and-forth. I think adding it to the DataSink interface results in less code, but prevents some usage patterns of the DataSink interface that may make sense in the future. E.g. if the caller of the DataSink methods wants to have multiple batches in flight - maybe it's pulling data from multiple queues? Line 70: const bool using_builder_interface_; > Isn't the accumulates_batches_ flag sufficient? The comment on accumulates_ No, they're different. The DataSink interface doesn't have a GetNextBatchForBuild() method, so we can't assume that the DataSink client allocated a batch in that way - it can pass in an arbitrary row batch. This flag implies that we can assume that the caller allocated the batch from the cache. E.g. for PartitionedHashJoinNode, using_builder_interface_ will be true, but accumulates_batches_ will be false, since the contents of the batches are always copied. PS11, Line 70: using_builder_interface_ I renamed this to using_join_build_interface for consistency. Line 73: /// If true, the caller should call NextBuildBatch() to get an empty batch before every Updated this comment to reflect that only BlockingJoinNode can call GetNextEmptyBatch(). Line 76: const bool accumulates_batches_; > Couldn't this be done by making GetNextBatchForBuild() virtual? My thinking I did that in an earlier version of the patch. The concern with that approach was that a subplan-specific optimisation was complicating the interface for all data sinks, most of which can never appear in a subplan. The virtue of the current approach is that the complication for this optimisation is isolated to blocking-join-node.{cc,h} Line 150: virtual Status ConstructBuildSide(RuntimeState* state) = 0; > Terminology nit that is not your fault: To me "Build Side" means "the exec Did this. Line 154: /// If build_sink is non-NULL, the sink interface will be used to construct the build > If build_sink is non-NULL, uses it to construct the build side. Done http://gerrit.cloudera.org:8080/#/c/3282/8/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 166 > single line This was removed in a later version of the patch. http://gerrit.cloudera.org:8080/#/c/3282/8/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 71: virtual Status FlushFinal(RuntimeState* state) = 0; > The important piece seems to be that the returned batch is empty and is exp The method was removed in the later patchset. Will do this if we decide to revert to this approach. Line 104: /// Owned by the RuntimeState. > const methods Done http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/hash-join-node.cc File be/src/exec/hash-join-node.cc: Line 207: RETURN_IF_ERROR(BlockingJoinNode::GetFirstProbeRow(state)); > Seems a little weird that we prepare for probe after getting the first prob ResetForProbe() looks up the first probe row in the hash table for GetNext(), so it needs to be called after GetFirstProbeRow(). This keeps the flow more analogous to PHJ (although ResetForProbe() is called there once per batch). http://gerrit.cloudera.org:8080/#/c/3282/8/be/src/exec/hash-join-node.h File be/src/exec/hash-join-node.h: Line 70: /// holds everything referenced from build side > Can you spell out 'everything' a little more? I don't actually know. This was a mechanical refactor to move this from BlockingJoinNode since it's not used by any other subclass. Hoping to delete this module soon. Line 120: void ResetForProbe(); > PrepareForProbe()? Renamed to InitGetNext(). http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/hash-join-node.h File be/src/exec/hash-join-node.h: Line 70: /// holds everything referenced from build side > can you spell out 'everything' a little more? is it memory referenced from Replied on the other patchset, but this was just a mechanical refactor to move it from BlockingJoinNode. Doesn't seem worth it to clean this up. http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/hdfs-table-writer.cc File be/src/exec/hdfs-table-writer.cc: Line 20: #include "runtime/row-batch.h" > remove? Done. Maybe was an editing error. http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/nested-loop-join-builder.h File be/src/exec/nested-loop-join-builder.h: Line 24: /// Builder for the NestedLoopJoinNode operator that accumulates the build-side rows > remove "operator" Done http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/nested-loop-join-node.cc File be/src/exec/nested-loop-join-node.cc: Line 59: SCOPED_TIMER(runtime_profile_->total_time_counter()); > won't this double count the time spent in BlockingJoinNode::Open()? I think The general pattern (E.g. ExecNode::Open(), BlockingJoinNode::Open()) is that the abstract base classes don't use timers, so I think this is right as-is. http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/nested-loop-join-node.h File be/src/exec/nested-loop-join-node.h: Line 59: boost::scoped_ptr<NljBuilder> build_side_; > build_sink_? Renamed to builder_ to match the class name. http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 242: SCOPED_TIMER(runtime_profile_->total_time_counter()); > I think the counter should start after BlockingJoinNode::Open() Why is that? It seems strange not to include the time spent in the parent class. http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/exec/row-batch-cache.h File be/src/exec/row-batch-cache.h: Line 61: row_batches_.clear(); // unique_ptr automatically calls all destructors. > Add a DCHECK in the RowBatchCache d'tor to make sure row_batches_ is empty? Done http://gerrit.cloudera.org:8080/#/c/3282/11/be/src/util/stopwatch.h File be/src/util/stopwatch.h: Line 241: ScopedStopWatch(T* sw, bool enabled = true) : > The enabled flag seems awkward because it's not clear why anyone would use If you used ScopedStopWatch directly it would still be useful in any situation where you want to have a scoped timer but also make a decision about whether to enable a timer at runtime (or at compile time with a template argument). Otherwise you have to mess around ifdefs or very carefully turn the timer on and off at the right places. You could mess about with template types and have a DummyScopedStopWatch type, but that seems clunkier and less general (since you can't make a runtime decision to enable the timer then). -- To view, visit http://gerrit.cloudera.org:8080/3282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475 Gerrit-PatchSet: 11 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
