-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19724/#review38772
-----------------------------------------------------------

Ship it!


Thank you for the awesome patch!


http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
<https://reviews.apache.org/r/19724/#comment71058>

    This sounds like a good refactoring.
    
    Just curious. Are we going to do the same for POLR too, or is this just for 
POVO?



http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterStatsTez.java
<https://reviews.apache.org/r/19724/#comment71051>

    Just a question. Should we move all the Tez physical operators under this 
package?


- Cheolsoo Park


On March 27, 2014, 5:30 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19724/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 5:30 p.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3814
>     https://issues.apache.org/jira/browse/PIG-3814
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Rank implementation in Tez is different from MR implementation.
>   * MR Implementation has 1 map-only job (POCounter) which sets the Current 
> taskId at position 0 of tuple and local map task counter at position 1. It 
> also emits job Counters for the number of records in that map task. 
> JobControlCompiler collects those, calculate offsets and launches the next 
> map only job (PORank) with those offset information in the jobconf. 
>   * Tez Implementation has 3 vertices. Vertex 1 outputs tuples from POCounter 
> to Vertex 3. It also outputs the counters to Vertex 2 which calculates the 
> offsets and broadcasts it to Vertex 3.
> 
> Common (MR and Tez) Perf optimizations made:
>    - Changed taskid to be Integer instead of String to reduce memory overhead.
>    - POCounter sets the Current taskId at position 0 of tuple and counter at 
> position 1. PORank create a new tuple of size-1 to remove the task id and 
> copies over the rest which is lot of overhead. Setting the task id as the 
> last element of tuple and removing that from arraylist instead of doing a 
> copy. 
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/branches/tez/ivy/libraries.properties 
> 1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduceCounter.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POValueOutputTez.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezTaskConfigurable.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterStatsTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/PORankTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/e2e/pig/drivers/TestDriverPig.pm
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/e2e/pig/tests/nightly.conf
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestCombiner.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC20.gld
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC21.gld
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java
>  1582317 
> 
> Diff: https://reviews.apache.org/r/19724/diff/
> 
> 
> Testing
> -------
> 
> Enabled Rank e2e tests for tez. Except Rank 9 and 11, others pass. Rank 9 has 
> some Tez map output data corruption issue. Yet to investigate. Rank 11 is a 
> issue with SPLIT and aware of the reason. The input keys need to be updated 
> in MultiQueryOptimizerTez after Tez operators have been merged. That is 
> already done for POFRJoinTez. But trying to think of a generic way to do this 
> (new interfaces to get input keys and output keys), so that we don't have to 
> add every operator to MultiQueryOptimizerTez. Will do that in a separate jira.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>

Reply via email to