> On July 24, 2014, 7:46 p.m., Cheolsoo Park wrote: > > Quang-Nhat HOANG-XUAN, thank you very much for the patch. > > > > I briefly went it over and found some high-level issues to be address- > > 1) Please rebase your patch to trunk. New features are only committed into > > trunk. > > 2) Please avoid using hard-coded strings. Change them to static variables, > > etc. > > 3) Please use primitives instead of objects where possible. JobConf has > > set/get[Type] methods, so no need to convert primitives to objects and call > > toString(). > > 4) Please avoid adding unnecessary object fields and instead use local > > variables. > > 5) Please make PORollupH2IRGForEach a subclass of POForEach. This seems to > > save quite a few code changes. > > 6) Please use camel case for variable/field/method names and avoid > > underscores. > > 7) Please remove "this" when unnecessary. e.g. this.rollup_position => > > rollup_position. > > 8) Please do not use tabs for indentation but use 4 whitespaces. (You can > > set up IDE for this.) > > 9) Please delete all the trailing whitespaces. (You can set up IDE for > > this.) > > > > I stopped reviewing it for now. Once you clean up your patch a bit, I will > > resume my review and try to run tests. > > > > Thanks!
Thank you so much! I will fix them. - Quang-Nhat ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23804/#review48543 ----------------------------------------------------------- On July 22, 2014, 9:20 a.m., Quang-Nhat HOANG-XUAN wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/23804/ > ----------------------------------------------------------- > > (Updated July 22, 2014, 9:20 a.m.) > > > Review request for pig. > > > Bugs: PIG-4066 > https://issues.apache.org/jira/browse/PIG-4066 > > > Repository: pig > > > Description > ------- > > This patch aims at addressing the current limitation of the ROLLUP operator > in PIG: most of the work is done in the Map phase of the underlying MapReduce > job to generate all possible intermediate keys that the reducer use to > aggregate and produce the ROLLUP output. Based on our previous work: > “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of > MapReduce ROLLUP aggregates” > (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we > show that the design space for a ROLLUP implementation allows for a different > approach (in-reducer grouping, IRG), in which less work is done in the Map > phase and the grouping is done in the Reduce phase. This patch presents the > most efficient implementation we designed (Hybrid IRG), which allows defining > a parameter to balance between parallelism (in the reducers) and > communication cost. > This patch contains the following features: > 1. The new ROLLUP approach: IRG, Hybrid IRG. > 2. The PIVOT clause in CUBE operators. > 3. Test cases. > The new syntax to use our ROLLUP approach: > alias = CUBE rel BY > { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { CUBE col_ref | > ROLLUP col_ref [PIVOT pivot_value]} > ...] > In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP > operator will be executed with our approach (IRG, Hybrid IRG) while the > remaining ROLLUP ahead will be executed with the default approach. > We have already made some experiments for comparison between our ROLLUP > implementation and the current ROLLUP. More information can be found at here: > http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/ > > > Diffs > ----- > > > trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java > PRE-CREATION > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java > 1579421 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java > PRE-CREATION > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/util/PlanHelper.java > 1579421 > trunk/src/org/apache/pig/builtin/RollupDimensions.java 1579421 > trunk/src/org/apache/pig/newplan/logical/DotLOPrinter.java 1579421 > > trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java > 1579421 > > trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java 1579421 > trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1579421 > trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java 1579421 > > trunk/src/org/apache/pig/newplan/logical/relational/LORollupH2IRGForEach.java > PRE-CREATION > > trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java > 1579421 > > trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java > 1579421 > trunk/src/org/apache/pig/newplan/logical/rules/OptimizerUtils.java 1579421 > trunk/src/org/apache/pig/newplan/logical/rules/RollupH2IRGOptimizer.java > PRE-CREATION > trunk/src/org/apache/pig/parser/AliasMasker.g 1579421 > trunk/src/org/apache/pig/parser/AstPrinter.g 1579421 > trunk/src/org/apache/pig/parser/AstValidator.g 1579421 > trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1579421 > trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1579421 > trunk/src/org/apache/pig/parser/QueryLexer.g 1579421 > trunk/src/org/apache/pig/parser/QueryParser.g 1579421 > trunk/test/org/apache/pig/test/TestCubeOperator.java 1579421 > trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java > 1579421 > trunk/test/org/apache/pig/test/utils/GenPhyOp.java 1579421 > > Diff: https://reviews.apache.org/r/23804/diff/ > > > Testing > ------- > > > Thanks, > > Quang-Nhat HOANG-XUAN > >
