-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16313/
-----------------------------------------------------------
(Updated Dec. 18, 2013, 3:04 a.m.)
Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini
Palaniswamy.
Changes
-------
Incorporated Rohini's comments-
* Adds an e2e test case for outer replicated join.
* Adds a unit test case for 3-way replicated join.
* Adds a unit test case for replicated join in reducer.
* Cleans up POShuffleTezLoad code to make use of inputKeys. Now
POShuffleTezLoad#attachInputs() looks up LogicalInputs by inputKey and only
attaches applicable ones to itself. For example, it attaches
ShuffledMergedInputs but ignores ShuffledUnorderedKVInputs. This is needed
because it is possible for both broadcast and scatter/gather edges to be
attached to the same vertex. In that case, we should only attach applicable
inputs to different operators in the vertex.
* Includes the fix for PIG-3624 (establishing the order of joined columns).
ant test-tez passes.
e2e test passes.
Bugs: PIG-3604
https://issues.apache.org/jira/browse/PIG-3604
Repository: pig-git
Description
-------
Implemented replicated join in Tez as follows:
- POFRJoinTez extends POFRJoin. The difference between two is that replication
hash table is constructed out of broadcasting edges in Tez instead of files on
distributed cache in MR.
- TezCompiler adds a vertex per replicated table and connect it to POFRJoin
vertex via broadcasting edge.
Note that in POLocalRerrangeTez, I package tuples in the same way for broadcast
and scatter/gather edges, so I removed outputType (DataMovementType).
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
d7c54d8
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
e900751
src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java
e69de29
src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
cda5d89
src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
d76cfc5
src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java
e6f9be5
src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
7a1736a
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
2584501
src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
96ccdde
test/e2e/pig/tests/tez.conf b280698
test/org/apache/pig/test/data/GoldenFiles/TEZC10.gld e69de29
test/org/apache/pig/test/data/GoldenFiles/TEZC11.gld e69de29
test/org/apache/pig/tez/TestTezCompiler.java 79dc94e
Diff: https://reviews.apache.org/r/16313/diff/
Testing
-------
Added a unit test case to TestTezCompiler.
Added a e2e test case to Join.
ant test-tez passes.
e2e test passes.
Thanks,
Cheolsoo Park