> On Dec. 1, 2013, 5:06 p.m., Rohini Palaniswamy wrote:
> > The code is fine if we have union after some processing. But for simple 
> > load and union case as below, this will create 3 vertices - 2 load vertices 
> > and one union vertex. 
> > 
> > a = load 'a'
> > b = load 'b'
> > c = union a, b
> > 
> >  In MR, this is handled in a simple map 
> > 
> > C: Store(/tmp/tezout:PigStorage) - scope-23
> > |
> > |---C: Union[bag] - scope-22
> >     |
> >     |---A: New For Each(false,false,false)[bag] - scope-10
> >     |   |   |
> >     |   ..........
> >     |
> >     |---B: New For Each(false,false,false)[bag] - scope-21
> >         |   |
> >         |   .........
> >         |
> >         |---B: Load(/tmp/data:org.apache.pig.builtin.PigStorage) - 
> > scope-11--------
> > 
> > We should also try do that in a single vertex to be more optimal. We can 
> > handle that in a separate jira though. 
> >
> 
> Cheolsoo Park wrote:
>     Thank you Rohini for the review!
>     
>     You're right that we can optimized it once Tez allows multiple inputs on 
> root vertices. But when I tried to implement union in a single vertex, I ran 
> into this error-
>     
>     Caused by: java.lang.IllegalStateException: For now, only a single Root 
> Input can be added to a Vertex
>         at org.apache.tez.dag.api.Vertex.addInput(Vertex.java:156)
>     
>     So it seems not allowed for now.

MR handles it with multiple inputs in PigInputFormat. We should be able to do 
that with Tez too. 


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15931/#review29567
-----------------------------------------------------------


On Dec. 1, 2013, 7 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15931/
> -----------------------------------------------------------
> 
> (Updated Dec. 1, 2013, 7 a.m.)
> 
> 
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini 
> Palaniswamy.
> 
> 
> Bugs: PIG-3585
>     https://issues.apache.org/jira/browse/PIG-3585
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> This patch implements union as follows: load vertices -> broadcast edges -> 
> union vertex.
> 
> Th changes include:
> * In the front-end, TezCompiler converts POUnion into a new vertex and 
> connects it to its predecessors with broadcast edges.
> * In the back-end, a new POPackage class called POBroadcastTezLoad is added. 
> This classes implements TezLoad interface, and it pulls every record from 
> ShuffledUnorderedKVInputs in order and unions them.
> 
> 
> Diffs
> -----
> 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java
>  e49de40 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/POBroadcastTezLoad.java 
> e69de29 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 
> 9a2b499 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
> 529bf30 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
> e3f5a5d 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java 
> dcd6a5a 
>   test/e2e/pig/tests/tez.conf 7fd5fb1 
> 
> Diff: https://reviews.apache.org/r/15931/diff/
> 
> 
> Testing
> -------
> 
> * New e2e test case is added.
> * ant test-tez passes.
> * All e2e tests pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>

Reply via email to