----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23804/#review48543 -----------------------------------------------------------
Quang-Nhat HOANG-XUAN, thank you very much for the patch. I briefly went it over and found some high-level issues to be address- 1) Please rebase your patch to trunk. New features are only committed into trunk. 2) Please avoid using hard-coded strings. Change them to static variables, etc. 3) Please use primitives instead of objects where possible. JobConf has set/get[Type] methods, so no need to convert primitives to objects and call toString(). 4) Please avoid adding unnecessary object fields and instead use local variables. 5) Please make PORollupH2IRGForEach a subclass of POForEach. This seems to save quite a few code changes. 6) Please use camel case for variable/field/method names and avoid underscores. 7) Please remove "this" when unnecessary. e.g. this.rollup_position => rollup_position. 8) Please do not use tabs for indentation but use 4 whitespaces. (You can set up IDE for this.) 9) Please delete all the trailing whitespaces. (You can set up IDE for this.) I stopped reviewing it for now. Once you clean up your patch a bit, I will resume my review and try to run tests. Thanks! trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/23804/#comment85197> Please use class.getName() instead of hardcoding the class name. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/23804/#comment85198> Please add the property to PigConfiguration as a static variable. Also, can you rename the property to "pig.xxx"? trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/23804/#comment85199> Use class.getName(). trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/23804/#comment85201> Please add the properties to PigConfiguration and rename them to "pig.xxx". trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/23804/#comment85202> class.getName() trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java <https://reviews.apache.org/r/23804/#comment85203> Is it a bad idea to make PORollupH2IRGForEach be a subclass of POForEach? trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java <https://reviews.apache.org/r/23804/#comment85204> Integer => int. No need for Integer. Also, you can simplify Integer.parseInt(jConf.get()) to jConf.getInt(). trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java <https://reviews.apache.org/r/23804/#comment85210> Why are these object fields? Can't you change them to local variables? trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java <https://reviews.apache.org/r/23804/#comment85213> Unused. Delete? trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java <https://reviews.apache.org/r/23804/#comment85209> Unused, delete? trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java <https://reviews.apache.org/r/23804/#comment85211> choosen is initialized to 0 and never updated, so choosen % numPartitions will be always 0. Is this intentional? If so, use a constant instead of a variable? trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java <https://reviews.apache.org/r/23804/#comment85214> Can you please add a comment explaining what you're doing here? It's not obvious to me. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java <https://reviews.apache.org/r/23804/#comment85206> Add @Override. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java <https://reviews.apache.org/r/23804/#comment85205> Use conf.getInt(). trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85223> Change Boolean to boolean. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85217> Chanage Integer to int. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85216> Change Integer to int. I guess you use Integer because JobControlCompiler sets the properties in JobConf as string. But you can use jobConf.setInt() instead. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85221> Chanage Integer to int. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85220> Chanage Integer to int. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85219> Chanage Integer to int. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85218> Chanage Integer to int. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85222> funcSpec.getClassName() already returns string, so no need to call toString() again. Also, can you use class.getName() instead of hardcoding the class name? Lastly, remove "//&& isOptimized (disable rule)". trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/23804/#comment85224> Here the indentation is confusing. Does "((RollupDimensions)func).setRollupH2IRGOptimizable(this.rollupH2IRGoptimizable);" belong to "if(this.pivot!=null)"? If so, please surround the if body with {}. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java <https://reviews.apache.org/r/23804/#comment85225> If PORollupH2IRGForEach is a subclass of POForEach, you don't this duplicate code, do you? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java <https://reviews.apache.org/r/23804/#comment85226> Again, if PORollupH2IRGForEach is a subclass of POForEach, you don't this duplicate code, do you? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java <https://reviews.apache.org/r/23804/#comment85227> Again, if PORollupH2IRGForEach is a subclass of POForEach, you don't this duplicate code, do you? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java <https://reviews.apache.org/r/23804/#comment85228> Please use primitives. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java <https://reviews.apache.org/r/23804/#comment85229> Make this private? Also, please remove _'s in the method name. trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java <https://reviews.apache.org/r/23804/#comment85402> Same as above. getFuncSpec() already returns string, so no need to call toString() again. Also, please use class.getName() instead of hardcoding the class name. trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java <https://reviews.apache.org/r/23804/#comment85400> Please use primitives. trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java <https://reviews.apache.org/r/23804/#comment85403> Why is this protected? Change to private? - Cheolsoo Park On July 22, 2014, 9:20 a.m., Quang-Nhat HOANG-XUAN wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/23804/ > ----------------------------------------------------------- > > (Updated July 22, 2014, 9:20 a.m.) > > > Review request for pig. > > > Bugs: PIG-4066 > https://issues.apache.org/jira/browse/PIG-4066 > > > Repository: pig > > > Description > ------- > > This patch aims at addressing the current limitation of the ROLLUP operator > in PIG: most of the work is done in the Map phase of the underlying MapReduce > job to generate all possible intermediate keys that the reducer use to > aggregate and produce the ROLLUP output. Based on our previous work: > “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of > MapReduce ROLLUP aggregates” > (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we > show that the design space for a ROLLUP implementation allows for a different > approach (in-reducer grouping, IRG), in which less work is done in the Map > phase and the grouping is done in the Reduce phase. This patch presents the > most efficient implementation we designed (Hybrid IRG), which allows defining > a parameter to balance between parallelism (in the reducers) and > communication cost. > This patch contains the following features: > 1. The new ROLLUP approach: IRG, Hybrid IRG. > 2. The PIVOT clause in CUBE operators. > 3. Test cases. > The new syntax to use our ROLLUP approach: > alias = CUBE rel BY > { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { CUBE col_ref | > ROLLUP col_ref [PIVOT pivot_value]} > ...] > In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP > operator will be executed with our approach (IRG, Hybrid IRG) while the > remaining ROLLUP ahead will be executed with the default approach. > We have already made some experiments for comparison between our ROLLUP > implementation and the current ROLLUP. More information can be found at here: > http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/ > > > Diffs > ----- > > > trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java > PRE-CREATION > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java > PRE-CREATION > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/util/PlanHelper.java > 1579421 > trunk/src/org/apache/pig/builtin/RollupDimensions.java 1579421 > trunk/src/org/apache/pig/newplan/logical/DotLOPrinter.java 1579421 > > trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java > 1579421 > > trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java 1579421 > trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1579421 > trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java 1579421 > > trunk/src/org/apache/pig/newplan/logical/relational/LORollupH2IRGForEach.java > PRE-CREATION > > trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java > 1579421 > > trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/rules/OptimizerUtils.java 1579421 > trunk/src/org/apache/pig/newplan/logical/rules/RollupH2IRGOptimizer.java > PRE-CREATION > trunk/src/org/apache/pig/parser/AliasMasker.g 1579421 > trunk/src/org/apache/pig/parser/AstPrinter.g 1579421 > trunk/src/org/apache/pig/parser/AstValidator.g 1579421 > trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1579421 > trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1579421 > trunk/src/org/apache/pig/parser/QueryLexer.g 1579421 > trunk/src/org/apache/pig/parser/QueryParser.g 1579421 > trunk/test/org/apache/pig/test/TestCubeOperator.java 1579421 > trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java > 1579421 > trunk/test/org/apache/pig/test/utils/GenPhyOp.java 1579421 > > Diff: https://reviews.apache.org/r/23804/diff/ > > > Testing > ------- > > > Thanks, > > Quang-Nhat HOANG-XUAN > >
