-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40743/
-----------------------------------------------------------
(Updated Jan. 27, 2016, 9:25 a.m.)
Review request for pig, Mohit Sabharwal and Xuefu Zhang.
Changes
-------
Rebased and addressed review comments
Bugs: PIG-4709
https://issues.apache.org/jira/browse/PIG-4709
Repository: pig-git
Description
-------
Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the
grouped data is consumed by subsequent operations to perform algebraic
operations, this is sub-optimal as there is lot of shuffle traffic.
The Spark Plan must be optimized to use reduceBy, where possible, so that a
combiner is used.
Introduced a combiner optimizer that does the following:
// Checks for algebraic operations and if they exist.
// Replaces global rearrange (cogroup) with reduceBy as follows:
// Input:
// foreach (using algebraicOp)
// -> packager
// -> globalRearrange
// -> localRearrange
// Output:
// foreach (using algebraicOp.Final)
// -> reduceBy (uses algebraicOp.Intermediate)
// -> foreach (using algebraicOp.Initial)
// -> localRearrange
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
4e7bf00
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java
5f74992
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LocalRearrangeConverter.java
9ce0492
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PigSecondaryKeyComparatorSpark.java
PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java
PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/spark/operator/POReduceBySpark.java
PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java
PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
6b66ca1
src/org/apache/pig/backend/hadoop/executionengine/util/SecondaryKeyOptimizerUtil.java
546d91e
test/org/apache/pig/test/TestCombiner.java df44293
Diff: https://reviews.apache.org/r/40743/diff/
Testing
-------
The patch unblocked one UT in TestCombiner. Added another UT in the same class.
Also did some manual testing.
Thanks,
Pallavi Rao