> On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote: > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, > > line 180 > > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line180> > > > > Does this work properly? I was thinking that the key class, comparator, > > etc would also need to be a part of this conf. They're not right now but I > > thought tez was falling back if no edge conf was present
The key class, comparator, etc are added to the payload of edges by setIntermediateInputKeyValue() and setIntermediateOutputKeyValue() calls in addCombiner(). > On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote: > > src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java, > > line 76 > > <https://reviews.apache.org/r/15261/diff/1/?file=379007#file379007line76> > > > > Does this still hold true for tez? What about a load + group bys on > > different keys? True. Can we punt this for now? > On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote: > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, > > lines 306-313 > > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line306> > > > > Move to edge creation Right now, the vertex pipeline still has POPackage in plan, so setting input keys in the payload of vertex is necessary. > On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote: > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, > > line 297 > > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line297> > > > > This should go with the rest of the edge creation. Anything that's > > related to shuffles or input/outputs needs to be done per edge. Now I create a new configuration object per edge and set up input/output keys there. However, I still need to set input/output keys in vertex because I do not eliminate local rearrange and package from its pipeline. Please correct me if I am misunderstanding. > On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote: > > src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java, > > line 39 > > <https://reviews.apache.org/r/15261/diff/1/?file=379000#file379000line39> > > > > I think a more global view of things is needed here. For example, I > > think a multi-parent node will have trouble. This could show up with a > > group by + join on the same key, where the group by may need a combiner. > > > > It may be necessary to add more information when the TezOps are > > constructed. I actually solved this in a different way. Instead of keeping the reference to the previously visited vertex, I use TezOperPlan (parent plan of each TezOperator) to retrieve predecessors while visiting a vertex. So now I can handle the multi-parent node case nicely. > On Nov. 6, 2013, 10:54 p.m., Mark Wagner wrote: > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java, > > line 294 > > <https://reviews.apache.org/r/15261/diff/1/?file=379002#file379002line294> > > > > We can remove this since the work I'm referencing is essentially this > > JIRA - Cheolsoo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15261/#review28317 ----------------------------------------------------------- On Nov. 6, 2013, 11:55 p.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/15261/ > ----------------------------------------------------------- > > (Updated Nov. 6, 2013, 11:55 p.m.) > > > Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3555 > https://issues.apache.org/jira/browse/PIG-3555 > > > Repository: pig-git > > > Description > ------- > > Initial implementation of Tez combiner optimizer. The patch includes the > following changes- > * Factored out CombinerOptimizer code into a utility class called > CombinerOptimizerUtil. So both MR and Tez CombinerOptimizer use this utility > class instead of duplicating code. > * Introduced a new class called TezEdgeDescriptor that holds combine plans as > well as various edge properties. > * Added TezEdgeDescriptors to TezOperator. Note that I added multiple > descriptors for inbound edges but a single descriptor for all the outbound > edges. This is because TezDagBuilder always creates an edge by connecting > predecessors to the current vertex. Please let me know if you think we should > allow multiple descriptors for outbound edges too. > * Refactored some code in TezDagBuilder while touching it. > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java > 18a382b > > src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 0b1f3c9 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > 45e47b0 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java > 3f14644 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java > e612d88 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java > 5a42ded > > src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java > e69de29 > test/org/apache/pig/test/data/GoldenFiles/TEZC1.gld 925f07e > test/org/apache/pig/test/data/GoldenFiles/TEZC2.gld a3974fe > test/org/apache/pig/test/data/GoldenFiles/TEZC3.gld a8c942b > test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld fb7c903 > test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld e6cd25e > > Diff: https://reviews.apache.org/r/15261/diff/ > > > Testing > ------- > > ant test-tez passes. > ant test-e2e-tez passes. > > I didn't add new test cases, but an e2e test case (Checkin_3) includes an > algebraic udf (count) following group-by. I also manually tested it on a live > cluster. > > > Thanks, > > Cheolsoo Park > >
