-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32868/
-----------------------------------------------------------
Review request for pig and Daniel Dai.
Bugs: PIG-4495
https://issues.apache.org/jira/browse/PIG-4495
Repository: pig
Description
-------
Patch is work in WIP. Code almost complete. In the process of adding more tests
and running full suite. Posting for any early comments.
This patch basically gets rid of the need for the ask TEZ-1190 Allow multiple
edges between two vertexes.
Changes done:
1) Case of Self join/cross/cogroup
- Multiple sub-plans of split write to the same output. The
POShuffleTezLoad is now capable of splitting the input into correct bags based
on the index in the key.
- Do not allow cases like self-replicate join
2) Case of union
- Multiple sub-plans of split write to the same output and connect to
the vertex group. If only sub-plans of the split are members of the union, then
no vertex group is created and split is directly connected to union successors.
- For cases like nightly.conf Union_16.pig which has multiple levels of
union all from same split, even the vertex group created is removed and all the
split sub-plans write directly to the successor.
3) Other optimizations done
- If there was a union followed by replicate join it was not optimized
(PIG-3856). But if the union is within the same split we now broadcast the
replicate join once to the split operator.
4) Refactored code in UnionOptimizer into methods for easy readability.
Diffs
-----
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezOperator.java
1671263
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezPOPackageAnnotator.java
1671263
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POShuffleTezLoad.java
1671263
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/MultiQueryOptimizerTez.java
1671263
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/TezOperDependencyParallelismEstimator.java
1671263
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/UnionOptimizer.java
1671263
Diff: https://reviews.apache.org/r/32868/diff/
Testing
-------
WIP. Will update with the new tests in the next patch
Thanks,
Rohini Palaniswamy