-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23804/#review48543
-----------------------------------------------------------


Quang-Nhat HOANG-XUAN, thank you very much for the patch.

I briefly went it over and found some high-level issues to be address-
1) Please rebase your patch to trunk. New features are only committed into 
trunk.
2) Please avoid using hard-coded strings. Change them to static variables, etc.
3) Please use primitives instead of objects where possible. JobConf has 
set/get[Type] methods, so no need to convert primitives to objects and call 
toString().
4) Please avoid adding unnecessary object fields and instead use local 
variables.
5) Please make PORollupH2IRGForEach a subclass of POForEach. This seems to save 
quite a few code changes.
6) Please use camel case for variable/field/method names and avoid underscores.
7) Please remove "this" when unnecessary. e.g. this.rollup_position => 
rollup_position.
8) Please do not use tabs for indentation but use 4 whitespaces. (You can set 
up IDE for this.)
9) Please delete all the trailing whitespaces. (You can set up IDE for this.)

I stopped reviewing it for now. Once you clean up your patch a bit, I will 
resume my review and try to run tests.

Thanks!


trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/23804/#comment85197>

    Please use class.getName() instead of hardcoding the class name.



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/23804/#comment85198>

    Please add the property to PigConfiguration as a static variable.
    
    Also, can you rename the property to "pig.xxx"?



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/23804/#comment85199>

    Use class.getName().



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/23804/#comment85201>

    Please add the properties to PigConfiguration and rename them to "pig.xxx".



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
<https://reviews.apache.org/r/23804/#comment85202>

    class.getName()



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java
<https://reviews.apache.org/r/23804/#comment85203>

    Is it a bad idea to make PORollupH2IRGForEach be a subclass of POForEach?



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
<https://reviews.apache.org/r/23804/#comment85204>

    Integer => int. No need for Integer.
    
    Also, you can simplify Integer.parseInt(jConf.get()) to jConf.getInt().



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
<https://reviews.apache.org/r/23804/#comment85210>

    Why are these object fields? Can't you change them to local variables?



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
<https://reviews.apache.org/r/23804/#comment85213>

    Unused. Delete?



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
<https://reviews.apache.org/r/23804/#comment85209>

    Unused, delete?



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
<https://reviews.apache.org/r/23804/#comment85211>

    choosen is initialized to 0 and never updated, so choosen % numPartitions 
will be always 0. Is this intentional? If so, use a constant instead of a 
variable?



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
<https://reviews.apache.org/r/23804/#comment85214>

    Can you please add a comment explaining what you're doing here? It's not 
obvious to me.



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
<https://reviews.apache.org/r/23804/#comment85206>

    Add @Override.



trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
<https://reviews.apache.org/r/23804/#comment85205>

    Use conf.getInt().



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85223>

    Change Boolean to boolean.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85217>

    Chanage Integer to int.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85216>

    Change Integer to int. I guess you use Integer because JobControlCompiler 
sets the properties in JobConf as string. But you can use jobConf.setInt() 
instead.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85221>

    Chanage Integer to int.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85220>

    Chanage Integer to int.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85219>

    Chanage Integer to int.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85218>

    Chanage Integer to int.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85222>

    funcSpec.getClassName() already returns string, so no need to call 
toString() again.
    
    Also, can you use class.getName() instead of hardcoding the class name?
    
    Lastly, remove "//&& isOptimized (disable rule)".



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/23804/#comment85224>

    Here the indentation is confusing. Does 
"((RollupDimensions)func).setRollupH2IRGOptimizable(this.rollupH2IRGoptimizable);"
 belong to "if(this.pivot!=null)"? If so, please surround the if body with {}.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
<https://reviews.apache.org/r/23804/#comment85225>

    If PORollupH2IRGForEach is a subclass of POForEach, you don't this 
duplicate code, do you?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
<https://reviews.apache.org/r/23804/#comment85226>

    Again, if PORollupH2IRGForEach is a subclass of POForEach, you don't this 
duplicate code, do you?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
<https://reviews.apache.org/r/23804/#comment85227>

    Again, if PORollupH2IRGForEach is a subclass of POForEach, you don't this 
duplicate code, do you?



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java
<https://reviews.apache.org/r/23804/#comment85228>

    Please use primitives.



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java
<https://reviews.apache.org/r/23804/#comment85229>

    Make this private? Also, please remove _'s in the method name.



trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java
<https://reviews.apache.org/r/23804/#comment85402>

    Same as above. getFuncSpec() already returns string, so no need to call 
toString() again.
    
    Also, please use class.getName() instead of hardcoding the class name.



trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java
<https://reviews.apache.org/r/23804/#comment85400>

    Please use primitives.



trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java
<https://reviews.apache.org/r/23804/#comment85403>

    Why is this protected? Change to private?


- Cheolsoo Park


On July 22, 2014, 9:20 a.m., Quang-Nhat HOANG-XUAN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23804/
> -----------------------------------------------------------
> 
> (Updated July 22, 2014, 9:20 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-4066
>     https://issues.apache.org/jira/browse/PIG-4066
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> This patch aims at addressing the current limitation of the ROLLUP operator 
> in PIG: most of the work is done in the Map phase of the underlying MapReduce 
> job to generate all possible intermediate keys that the reducer use to 
> aggregate and produce the ROLLUP output. Based on our previous work: 
> “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of 
> MapReduce ROLLUP aggregates” 
> (http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), we 
> show that the design space for a ROLLUP implementation allows for a different 
> approach (in-reducer grouping, IRG), in which less work is done in the Map 
> phase and the grouping is done in the Reduce phase. This patch presents the 
> most efficient implementation we designed (Hybrid IRG), which allows defining 
> a parameter to balance between parallelism (in the reducers) and 
> communication cost.
> This patch contains the following features:
> 1. The new ROLLUP approach: IRG, Hybrid IRG.
> 2. The PIVOT clause in CUBE operators.
> 3. Test cases.
> The new syntax to use our ROLLUP approach:
> alias = CUBE rel BY
> { CUBE col_ref | ROLLUP col_ref [PIVOT pivot_value]} [, { CUBE col_ref | 
> ROLLUP col_ref [PIVOT pivot_value]}
> ...]
> In case there is multiple ROLLUP operator in one CUBE clause, the last ROLLUP 
> operator will be executed with our approach (IRG, Hybrid IRG) while the 
> remaining ROLLUP ahead will be executed with the default approach.
> We have already made some experiments for comparison between our ROLLUP 
> implementation and the current ROLLUP. More information can be found at here: 
> http://hxquangnhat.github.io/PIG-ROLLUP-H2IRG/
> 
> 
> Diffs
> -----
> 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 
> 1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/RollupH2IRGPartitioner.java
>  PRE-CREATION 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/DotPOPrinter.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  1579421 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORollupH2IRGForEach.java
>  PRE-CREATION 
>   
> trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/util/PlanHelper.java
>  1579421 
>   trunk/src/org/apache/pig/builtin/RollupDimensions.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/DotLOPrinter.java 1579421 
>   
> trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java
>  1579421 
>   trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 
> 1579421 
>   
> trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java 
> 1579421 
>   trunk/src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java 
> 1579421 
>   trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 
> 1579421 
>   trunk/src/org/apache/pig/newplan/logical/optimizer/UidResetter.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java 1579421 
>   
> trunk/src/org/apache/pig/newplan/logical/relational/LORollupH2IRGForEach.java 
> PRE-CREATION 
>   
> trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
>  1579421 
>   
> trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java
>  1579421 
>   trunk/src/org/apache/pig/newplan/logical/rules/OptimizerUtils.java 1579421 
>   trunk/src/org/apache/pig/newplan/logical/rules/RollupH2IRGOptimizer.java 
> PRE-CREATION 
>   trunk/src/org/apache/pig/parser/AliasMasker.g 1579421 
>   trunk/src/org/apache/pig/parser/AstPrinter.g 1579421 
>   trunk/src/org/apache/pig/parser/AstValidator.g 1579421 
>   trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1579421 
>   trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1579421 
>   trunk/src/org/apache/pig/parser/QueryLexer.g 1579421 
>   trunk/src/org/apache/pig/parser/QueryParser.g 1579421 
>   trunk/test/org/apache/pig/test/TestCubeOperator.java 1579421 
>   trunk/test/org/apache/pig/test/TestNewPlanLogToPhyTranslationVisitor.java 
> 1579421 
>   trunk/test/org/apache/pig/test/utils/GenPhyOp.java 1579421 
> 
> Diff: https://reviews.apache.org/r/23804/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Quang-Nhat HOANG-XUAN
> 
>

Reply via email to