-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23804/
-----------------------------------------------------------

(Updated Sept. 11, 2014, 2:25 p.m.)


Review request for pig.


Bugs: PIG-4066
    https://issues.apache.org/jira/browse/PIG-4066


Repository: pig


Description
-------

This patch aims at addressing the current limitation of the ROLLUP operator in 
PIG: most of the work is done in the Map phase of the underlying MapReduce job 
to generate all possible intermediate keys that the reducer use to aggregate 
and produce the ROLLUP output. Based on our previous work: “Duy-Hung Phan, 
Matteo Dell’Amico, Pietro Michiardi: On the design space of MapReduce ROLLUP 
aggregates” 
(http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we 
show that the design space for a ROLLUP implementation allows for a different 
approach (in-reducer grouping, IRG), in which less work is done in the Map 
phase and the grouping is done in the Reduce phase. This patch presents the 
most efficient implementation we designed (Hybrid IRG), which allows defining a 
parameter to balance between parallelism (in the reducers) and communication 
cost.
This patch contains the following features:
1. The new ROLLUP approach: IRG, Hybrid IRG.
2. The PIVOT clause in CUBE operators.
3. Test cases.
The new syntax to use our ROLLUP approach:
alias = CUBE rel BY
{ CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { CUBE col_ref | ROLLUP 
col_ref [PIVOT pivot_value]}
...]
In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP 
operator will be executed with our approach (IRG, Hybrid IRG) while the 
remaining ROLLUP ahead will be executed with the default approach.
We have already made some experiments for comparison between our ROLLUP 
implementation and the current ROLLUP. More information can be found at here: 
http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/


Diffs (updated)
-----

  trunk/src/org/apache/pig/Main.java 1624212 
  trunk/src/org/apache/pig/PigConfiguration.java 1624212 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1624212 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
 1624212 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
 1624212 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupHIIPartitioner.java
 PRE-CREATION 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
 1624212 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
 1624212 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java
 1624212 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupHIIForEach.java
 PRE-CREATION 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/util/PlanHelper.java
 1624212 
  trunk/src/org/apache/pig/builtin/RollupDimensions.java 1624212 
  
trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java
 1624212 
  trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 
1624212 
  trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java 
1624212 
  trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1624212 
  trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java 1624212 
  trunk/src/org/apache/pig/newplan/logical/relational/LORollupHIIForEach.java 
PRE-CREATION 
  
trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
 1624212 
  trunk/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java 1624212 
  
trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
 1624212 
  trunk/src/org/apache/pig/newplan/logical/rules/OptimizerUtils.java 1624212 
  trunk/src/org/apache/pig/newplan/logical/rules/RollupHIIOptimizer.java 
PRE-CREATION 
  trunk/src/org/apache/pig/parser/AliasMasker.g 1624212 
  trunk/src/org/apache/pig/parser/AstPrinter.g 1624212 
  trunk/src/org/apache/pig/parser/AstValidator.g 1624212 
  trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1624212 
  trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1624212 
  trunk/src/org/apache/pig/parser/QueryLexer.g 1624212 
  trunk/src/org/apache/pig/parser/QueryParser.g 1624212 
  trunk/test/org/apache/pig/test/TestCubeOperator.java 1624212 

Diff: https://reviews.apache.org/r/23804/diff/


Testing
-------


Thanks,

Quang-Nhat HOANG-XUAN

Reply via email to