----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23804/#review64932 -----------------------------------------------------------
Ship it! Ran unit tests and e2e tests. - Cheolsoo Park On Dec. 7, 2014, 12:32 p.m., Quang-Nhat HOANG-XUAN wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/23804/ > ----------------------------------------------------------- > > (Updated Dec. 7, 2014, 12:32 p.m.) > > > Review request for pig. > > > Bugs: PIG-4066 > https://issues.apache.org/jira/browse/PIG-4066 > > > Repository: pig > > > Description > ------- > > This patch aims at addressing the current limitation of the ROLLUP operator > in PIG: most of the work is done in the Map phase of the underlying MapReduce > job to generate all possible intermediate keys that the reducer use to > aggregate and produce the ROLLUP output. Based on our previous work: > “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of > MapReduce ROLLUP aggregates” > (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we > show that the design space for a ROLLUP implementation allows for a different > approach (in-reducer grouping, IRG), in which less work is done in the Map > phase and the grouping is done in the Reduce phase. This patch presents the > most efficient implementation we designed (Hybrid IRG), which allows defining > a parameter to balance between parallelism (in the reducers) and > communication cost. > This patch contains the following features: > 1. The new ROLLUP approach: IRG, Hybrid IRG. > 2. The PIVOT clause in CUBE operators. > 3. Test cases. > The new syntax to use our ROLLUP approach: > alias = CUBE rel BY > { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { CUBE col_ref | > ROLLUP col_ref [PIVOT pivot_value]} > ...] > In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP > operator will be executed with our approach (IRG, Hybrid IRG) while the > remaining ROLLUP ahead will be executed with the default approach. > We have already made some experiments for comparison between our ROLLUP > implementation and the current ROLLUP. More information can be found at here: > http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/ > > > Diffs > ----- > > trunk/src/org/apache/pig/Main.java 1642549 > trunk/src/org/apache/pig/PigConstants.java 1642549 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 1642549 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java > 1642549 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java > 1642549 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupHIIPartitioner.java > PRE-CREATION > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java > 1642549 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java > 1642549 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java > 1642549 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupHIIForEach.java > PRE-CREATION > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/util/PlanHelper.java > 1642549 > trunk/src/org/apache/pig/builtin/RollupDimensions.java 1642549 > > trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java > 1642549 > trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java > 1642549 > > trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java > 1642549 > trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1642549 > trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java 1642549 > trunk/src/org/apache/pig/newplan/logical/relational/LORollupHIIForEach.java > PRE-CREATION > > trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java > 1642549 > trunk/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java > 1642549 > > trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java > 1642549 > trunk/src/org/apache/pig/newplan/logical/rules/OptimizerUtils.java 1642549 > trunk/src/org/apache/pig/newplan/logical/rules/RollupHIIOptimizer.java > PRE-CREATION > trunk/src/org/apache/pig/parser/AliasMasker.g 1642549 > trunk/src/org/apache/pig/parser/AstPrinter.g 1642549 > trunk/src/org/apache/pig/parser/AstValidator.g 1642549 > trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1642549 > trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1642549 > trunk/src/org/apache/pig/parser/QueryLexer.g 1642549 > trunk/src/org/apache/pig/parser/QueryParser.g 1642549 > trunk/test/org/apache/pig/test/TestCubeOperator.java 1642549 > > Diff: https://reviews.apache.org/r/23804/diff/ > > > Testing > ------- > > > Thanks, > > Quang-Nhat HOANG-XUAN > >