> On March 27, 2014, 6:27 p.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java,
> >  line 252
> > <https://reviews.apache.org/r/19724/diff/2/?file=538500#file538500line252>
> >
> >     This sounds like a good refactoring.
> >     
> >     Just curious. Are we going to do the same for POLR too, or is this just 
> > for POVO?

POLR would be easier to do based on key type here as there would be lot of 
instances of it and also comes from Physical plan. POValueOutputTez is only few 
places and always added by us. So was planning to do only for that. 


> On March 27, 2014, 6:27 p.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterStatsTez.java,
> >  line 19
> > <https://reviews.apache.org/r/19724/diff/2/?file=538503#file538503line19>
> >
> >     Just a question. Should we move all the Tez physical operators under 
> > this package?

Yes. Was planning to open another jira to do svn mv. Created PIG-3838 for that. 
There are just too many classes under tez package now. Time to organize into 
sub packages. Did not want to mess up this patch with that. 


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19724/#review38772
-----------------------------------------------------------


On March 27, 2014, 5:30 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19724/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 5:30 p.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3814
>     https://issues.apache.org/jira/browse/PIG-3814
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Rank implementation in Tez is different from MR implementation.
>   * MR Implementation has 1 map-only job (POCounter) which sets the Current 
> taskId at position 0 of tuple and local map task counter at position 1. It 
> also emits job Counters for the number of records in that map task. 
> JobControlCompiler collects those, calculate offsets and launches the next 
> map only job (PORank) with those offset information in the jobconf. 
>   * Tez Implementation has 3 vertices. Vertex 1 outputs tuples from POCounter 
> to Vertex 3. It also outputs the counters to Vertex 2 which calculates the 
> offsets and broadcasts it to Vertex 3.
> 
> Common (MR and Tez) Perf optimizations made:
>    - Changed taskid to be Integer instead of String to reduce memory overhead.
>    - POCounter sets the Current taskId at position 0 of tuple and counter at 
> position 1. PORank create a new tuple of size-1 to remove the task id and 
> copies over the rest which is lot of overhead. Setting the task id as the 
> last element of tuple and removing that from arraylist instead of doing a 
> copy. 
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/branches/tez/ivy/libraries.properties 
> 1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduceCounter.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCounter.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POValueOutputTez.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezTaskConfigurable.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterStatsTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/POCounterTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/operators/PORankTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/e2e/pig/drivers/TestDriverPig.pm
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/e2e/pig/tests/nightly.conf
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/TestCombiner.java
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld
>  1582317 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC20.gld
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC21.gld
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java
>  1582317 
> 
> Diff: https://reviews.apache.org/r/19724/diff/
> 
> 
> Testing
> -------
> 
> Enabled Rank e2e tests for tez. Except Rank 9 and 11, others pass. Rank 9 has 
> some Tez map output data corruption issue. Yet to investigate. Rank 11 is a 
> issue with SPLIT and aware of the reason. The input keys need to be updated 
> in MultiQueryOptimizerTez after Tez operators have been merged. That is 
> already done for POFRJoinTez. But trying to think of a generic way to do this 
> (new interfaces to get input keys and output keys), so that we don't have to 
> add every operator to MultiQueryOptimizerTez. Will do that in a separate jira.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>

Reply via email to