----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15261/#review28293 -----------------------------------------------------------
src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java <https://reviews.apache.org/r/15261/#comment55078> This + newEdges method does not seem to be right to me. Going with the MR combiner model, I would expect the combinePlan of the inEdge (we need to tag the TezEdgeDescriptor inEdges of the predecessor with the actual predecessor and the outedges with the actual successor) actually connecting these vertexes to be set on this Edge for both in and out. The same combinePlan should be applicable on both the outbound end (map/onfilesortedoutput) and inbound end (reduce/shufflemergeinput) of the edge connecting the vertices. - Rohini Palaniswamy On Nov. 6, 2013, 11:04 a.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/15261/ > ----------------------------------------------------------- > > (Updated Nov. 6, 2013, 11:04 a.m.) > > > Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3555 > https://issues.apache.org/jira/browse/PIG-3555 > > > Repository: pig-git > > > Description > ------- > > Initial implementation of Tez combiner optimizer. The patch includes the > following changes- > * Factored out CombinerOptimizer code into a utility class called > CombinerOptimizerUtil. So both MR and Tez CombinerOptimizer use this utility > class instead of duplicating code. > * Introduced a new class called TezEdgeDescriptor that holds combine plans as > well as various edge properties. > * Added TezEdgeDescriptors to TezOperator. Note that I added multiple > descriptors for inbound edges but a single descriptor for all the outbound > edges. This is because TezDagBuilder always creates an edge by connecting > predecessors to the current vertex. Please let me know if you think we should > allow multiple descriptors for outbound edges too. > * Refactored some code in TezDagBuilder while touching it. > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java > 18a382b > > src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 0b1f3c9 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > 45e47b0 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java > 3f14644 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java > e612d88 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java > 5a42ded > > src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java > e69de29 > test/org/apache/pig/test/data/GoldenFiles/TEZC1.gld 925f07e > test/org/apache/pig/test/data/GoldenFiles/TEZC2.gld a3974fe > test/org/apache/pig/test/data/GoldenFiles/TEZC3.gld a8c942b > test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld fb7c903 > test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld e6cd25e > > Diff: https://reviews.apache.org/r/15261/diff/ > > > Testing > ------- > > ant test-tez passes. > ant test-e2e-tez passes. > > I didn't add new test cases, but an e2e test case (Checkin_3) includes an > algebraic udf (count) following group-by. I also manually tested it on a live > cluster. > > > Thanks, > > Cheolsoo Park > >
