> On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java, > > line 229 > > <https://reviews.apache.org/r/40743/diff/2/?file=1155791#file1155791line229> > > > > Change > > boolean noCombiner = > > conf.getBoolean(PigConfiguration.PIG_EXEC_NO_COMBINER, false); > > > > to > > > > boolean noCombiner = > > Boolean.parseBoolean(pc.getProperties().getProperty( > > PigConfiguration.PIG_EXEC_NO_COMBINER, "false")); > > > > we get the configuration from the PigContext not the > > configuration. we can do the same modification to noSecondaryKey and > > isMultiQuery.
The conf is created locally from PigContext itself: Configuration conf = ConfigurationUtil.toConfiguration(pc.getProperties()); Code is more readable with my change without having to do pc.getProperties() every time. Functionality is not affected. > On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LocalRearrangeConverter.java, > > line 64 > > <https://reviews.apache.org/r/40743/diff/2/?file=1155793#file1155793line64> > > > > please don't remove this kind of code > > if (LOG.isDebugEnabled()) { > > LOG.debug("LocalRearrangeFunction in " + t); > > } > > > > developer uses it to print message to log file. users will not see it. I have added the debug statement. Not removed it. It wasn't there before. > On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PigSecondaryKeyComparatorSpark.java, > > line 29 > > <https://reviews.apache.org/r/40743/diff/3/?file=1156752#file1156752line29> > > > > remove the comment. Oops. My bad. Will remove. > On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PreCombinerLocalRearrangeConverter.java, > > line 35 > > <https://reviews.apache.org/r/40743/diff/3/?file=1156753#file1156753line35> > > > > PreCombinerLocalRearrangeConverter 's code is almost same as > > LocalRearrangeConverter's code. You can reuse LocalRearrangeConverter. > > > > If i miss something, please tell me. Yes. The code very similar, only the operator is different. I can change LocalRearrangeConverter to take in a PhysicalOperator, rather than POLocalRearrange. That way, the same class be re-used. Fine? > On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/SparkCombinerOptimizer.java, > > line 63 > > <https://reviews.apache.org/r/40743/diff/2/?file=1155798#file1155798line63> > > > > rename SparkCombinerOptimizer to CombinerOptimizer. This class is in > > spark package. No need to enphasize SparkCombinerCombinerOptimizer. Ok. Will do. > On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote: > > test/org/apache/pig/test/TestCombiner.java, line 263 > > <https://reviews.apache.org/r/40743/diff/2/?file=1155802#file1155802line263> > > > > It is nice to add new unit tests but can we ensure that it passes in > > tez and mr mode? It is a straight-forward PIG script, should pass on other exec modes too. Will double check. > On Dec. 16, 2015, 6:57 a.m., kelly zhang wrote: > > src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/SparkCombinerOptimizer.java, > > line 87 > > <https://reviews.apache.org/r/40743/diff/3/?file=1156756#file1156756line87> > > > > After reading your code? I found that following should be: > > // Output: > > // foreach (using algebraicOp.Final) > > // -> reduceBy (uses algebraicOp.Intermediate) > > // -> localRearrange > > // -> foreach (using algebraicOp.Initial) Yep. Will change. It should read: // Output: // foreach (using algebraicOp.Final) // -> reduceBy (uses algebraicOp.Intermediate) // -> localRearrange // -> foreach (using algebraicOp.Initial) // -> CombinerRearrange - Pallavi ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40743/#review109469 ----------------------------------------------------------- On Dec. 9, 2015, 5:49 a.m., Pallavi Rao wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40743/ > ----------------------------------------------------------- > > (Updated Dec. 9, 2015, 5:49 a.m.) > > > Review request for pig, Mohit Sabharwal and Xuefu Zhang. > > > Bugs: PIG-4709 > https://issues.apache.org/jira/browse/PIG-4709 > > > Repository: pig-git > > > Description > ------- > > Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the > grouped data is consumed by subsequent operations to perform algebraic > operations, this is sub-optimal as there is lot of shuffle traffic. > The Spark Plan must be optimized to use reduceBy, where possible, so that a > combiner is used. > > Introduced a combiner optimizer that does the following: > // Checks for algebraic operations and if they exist. > // Replaces global rearrange (cogroup) with reduceBy as follows: > // Input: > // foreach (using algebraicOp) > // -> packager > // -> globalRearrange > // -> localRearrange > // Output: > // foreach (using algebraicOp.Final) > // -> reduceBy (uses algebraicOp.Intermediate) > // -> foreach (using algebraicOp.Initial) > // -> localRearrange > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java > f8c1658 > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupHIIForEach.java > aca347d > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java > a4dbadd > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java > 5f74992 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LocalRearrangeConverter.java > 9ce0492 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PigSecondaryKeyComparatorSpark.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PreCombinerLocalRearrangeConverter.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/operator/POReduceBySpark.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/SparkCombinerOptimizer.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java > 6b66ca1 > > src/org/apache/pig/backend/hadoop/executionengine/util/SecondaryKeyOptimizerUtil.java > 546d91e > test/org/apache/pig/test/TestCombiner.java df44293 > > Diff: https://reviews.apache.org/r/40743/diff/ > > > Testing > ------- > > The patch unblocked one UT in TestCombiner. Added another UT in the same > class. Also did some manual testing. > > > Thanks, > > Pallavi Rao > >
