-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16313/#review30533
-----------------------------------------------------------


The approach is good when the replicate join is not the first vertex of the DAG 
(i.e in case of a MR, replicate join is part of a reduce). If it is the first 
vertex of the DAG, we need to compare and see that with this approach the 
performance does not regress with the MR's map only replicate join using 
distributed cache. Created PIG-3631 for follow up.


src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java
<https://reviews.apache.org/r/16313/#comment58489>

    We should set the input keys on POFRJoinTez in TezCompiler or TezDagBuilder 
and use that instead of just picking all matching instances of 
ShuffledUnorderedKVInput.



src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java
<https://reviews.apache.org/r/16313/#comment58498>

    We should make this as an info statement. It is going to be logged only 
once anyways and will be good information for debugging.



test/org/apache/pig/test/data/GoldenFiles/TEZC10.gld
<https://reviews.apache.org/r/16313/#comment58494>

    Can we print out the input key of the FRJoin similar to output in 
POLocalRearrange for easy debugging/understanding of the plan?
    
    For eg:
     c: FRJoin[tuple] - scope-18 <- scope-26



test/org/apache/pig/tez/TestTezCompiler.java
<https://reviews.apache.org/r/16313/#comment58495>

    Can we add cases for
     - three or four way join?
     - replicated table is part of a reduce output instead of being loaded 
directly. This is to handle the case where you don't create a separate vertex 
to broadcast, but broadcast from a existing vertex (POLocalRearrange) just 
changing the edge type to broadcast. Don't think the TezCompiler handles this 
now.


- Rohini Palaniswamy


On Dec. 17, 2013, 3:51 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16313/
> -----------------------------------------------------------
> 
> (Updated Dec. 17, 2013, 3:51 a.m.)
> 
> 
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini 
> Palaniswamy.
> 
> 
> Bugs: PIG-3604
>     https://issues.apache.org/jira/browse/PIG-3604
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Implemented replicated join in Tez as follows:
> - POFRJoinTez extends POFRJoin. The difference between two is that 
> replication hash table is constructed out of broadcasting edges in Tez 
> instead of files on distributed cache in MR.
> - TezCompiler adds a vertex per replicated table and connect it to POFRJoin 
> vertex via broadcasting edge.
> 
> Note that in POLocalRerrangeTez, I package tuples in the same way for 
> broadcast and scatter/gather edges, so I removed outputType 
> (DataMovementType). 
> 
> 
> Diffs
> -----
> 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
>  d7c54d8 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
>  e900751 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java 
> e69de29 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
>  cda5d89 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 
> 7a1736a 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
> 2584501 
>   test/e2e/pig/tests/tez.conf b280698 
>   test/org/apache/pig/test/data/GoldenFiles/TEZC10.gld e69de29 
>   test/org/apache/pig/tez/TestTezCompiler.java 79dc94e 
> 
> Diff: https://reviews.apache.org/r/16313/diff/
> 
> 
> Testing
> -------
> 
> Added a unit test case to TestTezCompiler.
> Added a e2e test case to Join.
> 
> ant test-tez passes.
> e2e test passes.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>

Reply via email to