> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java, 
> > line 229
> > <https://reviews.apache.org/r/40743/diff/2/?file=1155791#file1155791line229>
> >
> >     Change 
> >      boolean noCombiner = 
> > conf.getBoolean(PigConfiguration.PIG_EXEC_NO_COMBINER, false);
> >     
> >     to 
> >     
> >      boolean noCombiner = 
> > Boolean.parseBoolean(pc.getProperties().getProperty(
> >                     PigConfiguration.PIG_EXEC_NO_COMBINER, "false"));
> >                     
> >                     we get the configuration from the PigContext not the 
> > configuration.   we can do the same modification to noSecondaryKey and 
> > isMultiQuery.

The conf is created locally from PigContext itself:
Configuration conf = ConfigurationUtil.toConfiguration(pc.getProperties());

Code is more readable with my change without having to do pc.getProperties() 
every time. Functionality is not affected.


> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LocalRearrangeConverter.java,
> >  line 64
> > <https://reviews.apache.org/r/40743/diff/2/?file=1155793#file1155793line64>
> >
> >     please don't remove this kind of code
> >         if (LOG.isDebugEnabled()) {
> >                     LOG.debug("LocalRearrangeFunction in " + t);
> >                 }
> >                 
> >     developer uses it to print message to log file. users will not see it.

I have added the debug statement. Not removed it. It wasn't there before.


> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PigSecondaryKeyComparatorSpark.java,
> >  line 29
> > <https://reviews.apache.org/r/40743/diff/3/?file=1156752#file1156752line29>
> >
> >     remove the comment.

Oops. My bad. Will remove.


> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PreCombinerLocalRearrangeConverter.java,
> >  line 35
> > <https://reviews.apache.org/r/40743/diff/3/?file=1156753#file1156753line35>
> >
> >     PreCombinerLocalRearrangeConverter 's code is almost same as 
> > LocalRearrangeConverter's code. You can reuse LocalRearrangeConverter.
> >     
> >     If i miss something, please tell me.

Yes. The code very similar, only the operator is different. I can change 
LocalRearrangeConverter to take in a PhysicalOperator, rather than 
POLocalRearrange. That way, the same class be re-used. Fine?


> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/SparkCombinerOptimizer.java,
> >  line 63
> > <https://reviews.apache.org/r/40743/diff/2/?file=1155798#file1155798line63>
> >
> >     rename SparkCombinerOptimizer to CombinerOptimizer. This class is in 
> > spark package. No need to enphasize SparkCombinerCombinerOptimizer.

Ok. Will do.


> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote:
> > test/org/apache/pig/test/TestCombiner.java, line 263
> > <https://reviews.apache.org/r/40743/diff/2/?file=1155802#file1155802line263>
> >
> >     It is nice to add new unit tests but can we ensure that it passes in 
> > tez and mr mode?

It is a straight-forward PIG script, should pass on other exec modes too. Will 
double check.


> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/SparkCombinerOptimizer.java,
> >  line 87
> > <https://reviews.apache.org/r/40743/diff/3/?file=1156756#file1156756line87>
> >
> >     After reading your code? I found that following should be:
> >         // Output:
> >         // foreach (using algebraicOp.Final)
> >         //   -> reduceBy (uses algebraicOp.Intermediate)
> >         //      -> localRearrange
> >         //         -> foreach (using algebraicOp.Initial)

Yep. Will change. It should read:
    // Output:
    // foreach (using algebraicOp.Final)
    //   -> reduceBy (uses algebraicOp.Intermediate)
    //      -> localRearrange
    //         -> foreach (using algebraicOp.Initial)
    //             -> CombinerRearrange


- Pallavi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40743/#review109469
-----------------------------------------------------------


On Dec. 9, 2015, 5:49 a.m., Pallavi Rao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40743/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2015, 5:49 a.m.)
> 
> 
> Review request for pig, Mohit Sabharwal and Xuefu Zhang.
> 
> 
> Bugs: PIG-4709
>     https://issues.apache.org/jira/browse/PIG-4709
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the 
> grouped data is consumed by subsequent operations to perform algebraic 
> operations, this is sub-optimal as there is lot of shuffle traffic.
> The Spark Plan must be optimized to use reduceBy, where possible, so that a 
> combiner is used.
> 
> Introduced a combiner optimizer that does the following:
>     // Checks for algebraic operations and if they exist.
>     // Replaces global rearrange (cogroup) with reduceBy as follows:
>     // Input:
>     // foreach (using algebraicOp)
>     //   -> packager
>     //      -> globalRearrange
>     //          -> localRearrange
>     // Output:
>     // foreach (using algebraicOp.Final)
>     //   -> reduceBy (uses algebraicOp.Intermediate)
>     //      -> foreach (using algebraicOp.Initial)
>     //          -> localRearrange
> 
> 
> Diffs
> -----
> 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java
>  f8c1658 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupHIIForEach.java
>  aca347d 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 
> a4dbadd 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java
>  5f74992 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LocalRearrangeConverter.java
>  9ce0492 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PigSecondaryKeyComparatorSpark.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PreCombinerLocalRearrangeConverter.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/operator/POReduceBySpark.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/SparkCombinerOptimizer.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
>  6b66ca1 
>   
> src/org/apache/pig/backend/hadoop/executionengine/util/SecondaryKeyOptimizerUtil.java
>  546d91e 
>   test/org/apache/pig/test/TestCombiner.java df44293 
> 
> Diff: https://reviews.apache.org/r/40743/diff/
> 
> 
> Testing
> -------
> 
> The patch unblocked one UT in TestCombiner. Added another UT in the same 
> class. Also did some manual testing.
> 
> 
> Thanks,
> 
> Pallavi Rao
> 
>

Reply via email to