Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/3282
Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks ...................................................................... IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks Refactor DataSink interface to be more generic. We need more flexibility in setting up MemTrackers, so that memory is accounted against the right ExecNode. The NestedLoopJoinNode also differs from other joins in that it recycles row batches to avoid copies, so we need to add a NextBatchToSend() method to the DataSink that returns a batch. Also removes some redundancy between DataSink subclasses in setting up RuntimeProfiles, etc. Remove the redundancy in the DataSink between passing eos to GetNext() and FlushFinal(). This simplifies HdfsTableSink quite a bit and makes handling empty batches simpler. Partially refactor join nodes that so control flow between BlockingJoinNode::Open() and its subclasses is easier to follow. BlockingJoinNode now only calls one virtual function on its subclasses: ConstructBuildSide(). Once we convert all join nodes to use the DataSink interface, we will also be able to remove that as well. As a minor optimisation, avoid updating a timer that is ignored for non-async builds. As a proof of concept, this patch separates out the build side of NestedLoopJoinNode into a class that implements the DataSink interface. Refactoring the hash join is left for Part 2. Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475 --- M be/src/exec/CMakeLists.txt M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/hash-join-node.cc M be/src/exec/hash-join-node.h M be/src/exec/hbase-table-sink.cc M be/src/exec/hbase-table-sink.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/kudu-table-sink-test.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h A be/src/exec/nested-loop-join-builder.cc A be/src/exec/nested-loop-join-builder.h M be/src/exec/nested-loop-join-node.cc M be/src/exec/nested-loop-join-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/runtime/data-stream-test.cc M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/plan-fragment-executor.h M be/src/util/stopwatch.h 27 files changed, 656 insertions(+), 428 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/82/3282/5 -- To view, visit http://gerrit.cloudera.org:8080/3282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]>
