----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19724/#review38772 -----------------------------------------------------------
Ship it! Thank you for the awesome patch! http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java <https://reviews.apache.org/r/19724/#comment71058> This sounds like a good refactoring. Just curious. Are we going to do the same for POLR too, or is this just for POVO? http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterStatsTez.java <https://reviews.apache.org/r/19724/#comment71051> Just a question. Should we move all the Tez physical operators under this package? - Cheolsoo Park On March 27, 2014, 5:30 p.m., Rohini Palaniswamy wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19724/ > ----------------------------------------------------------- > > (Updated March 27, 2014, 5:30 p.m.) > > > Review request for pig, Cheolsoo Park and Daniel Dai. > > > Bugs: PIG-3814 > https://issues.apache.org/jira/browse/PIG-3814 > > > Repository: pig > > > Description > ------- > > Rank implementation in Tez is different from MR implementation. > * MR Implementation has 1 map-only job (POCounter) which sets the Current > taskId at position 0 of tuple and local map task counter at position 1. It > also emits job Counters for the number of records in that map task. > JobControlCompiler collects those, calculate offsets and launches the next > map only job (PORank) with those offset information in the jobconf. > * Tez Implementation has 3 vertices. Vertex 1 outputs tuples from POCounter > to Vertex 3. It also outputs the counters to Vertex 2 which calculates the > offsets and broadcasts it to Vertex 3. > > Common (MR and Tez) Perf optimizations made: > - Changed taskid to be Integer instead of String to reduce memory overhead. > - POCounter sets the Current taskId at position 0 of tuple and counter at > position 1. PORank create a new tuple of size-1 to remove the task id and > copies over the rest which is lot of overhead. Setting the task id as the > last element of tuple and removing that from arraylist instead of doing a > copy. > > > Diffs > ----- > > http://svn.apache.org/repos/asf/pig/branches/tez/ivy/libraries.properties > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduceCounter.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POValueOutputTez.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezTaskConfigurable.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterStatsTez.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterTez.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/PORankTez.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/e2e/pig/drivers/TestDriverPig.pm > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/e2e/pig/tests/nightly.conf > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestCombiner.java > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld > 1582317 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC20.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC21.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java > 1582317 > > Diff: https://reviews.apache.org/r/19724/diff/ > > > Testing > ------- > > Enabled Rank e2e tests for tez. Except Rank 9 and 11, others pass. Rank 9 has > some Tez map output data corruption issue. Yet to investigate. Rank 11 is a > issue with SPLIT and aware of the reason. The input keys need to be updated > in MultiQueryOptimizerTez after Tez operators have been merged. That is > already done for POFRJoinTez. But trying to think of a generic way to do this > (new interfaces to get input keys and output keys), so that we don't have to > add every operator to MultiQueryOptimizerTez. Will do that in a separate jira. > > > Thanks, > > Rohini Palaniswamy > >
