----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15261/#review28270 -----------------------------------------------------------
>> Note that I added multiple descriptors for inbound edges but a single >> descriptor for all the outbound edges. This is because TezDagBuilder always >> creates an edge by connecting predecessors to the current vertex. Actually, I am wrong here. The outbound edges should and can have multiple descriptors because TezDagBuilder should use a outbound edge from a predecessor vertex and an inbound edge of the current vertex. I will update my patch to fix this. - Cheolsoo Park On Nov. 6, 2013, 11:04 a.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/15261/ > ----------------------------------------------------------- > > (Updated Nov. 6, 2013, 11:04 a.m.) > > > Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3555 > https://issues.apache.org/jira/browse/PIG-3555 > > > Repository: pig-git > > > Description > ------- > > Initial implementation of Tez combiner optimizer. The patch includes the > following changes- > * Factored out CombinerOptimizer code into a utility class called > CombinerOptimizerUtil. So both MR and Tez CombinerOptimizer use this utility > class instead of duplicating code. > * Introduced a new class called TezEdgeDescriptor that holds combine plans as > well as various edge properties. > * Added TezEdgeDescriptors to TezOperator. Note that I added multiple > descriptors for inbound edges but a single descriptor for all the outbound > edges. This is because TezDagBuilder always creates an edge by connecting > predecessors to the current vertex. Please let me know if you think we should > allow multiple descriptors for outbound edges too. > * Refactored some code in TezDagBuilder while touching it. > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java > 18a382b > > src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 0b1f3c9 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > 45e47b0 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java > 3f14644 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java > e612d88 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java > 5a42ded > > src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java > e69de29 > test/org/apache/pig/test/data/GoldenFiles/TEZC1.gld 925f07e > test/org/apache/pig/test/data/GoldenFiles/TEZC2.gld a3974fe > test/org/apache/pig/test/data/GoldenFiles/TEZC3.gld a8c942b > test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld fb7c903 > test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld e6cd25e > > Diff: https://reviews.apache.org/r/15261/diff/ > > > Testing > ------- > > ant test-tez passes. > ant test-e2e-tez passes. > > I didn't add new test cases, but an e2e test case (Checkin_3) includes an > algebraic udf (count) following group-by. I also manually tested it on a live > cluster. > > > Thanks, > > Cheolsoo Park > >
