----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1817/#review1868 -----------------------------------------------------------
trunk/src/org/apache/pig/Main.java <https://reviews.apache.org/r/1817/#comment4282> I would go 1 further and add this property to the default pig.properties, along with the rest listed here. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java <https://reviews.apache.org/r/1817/#comment4284> there's a lot of errant whitespace around.. you can set your IDE to clean that up trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java <https://reviews.apache.org/r/1817/#comment4285> Not sure about the value of this comment :) trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4286> Leaves, technically :) trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4287> This is an extremely long function. Please refactor into more manageable chunks. That truly helps others to read and follow the code. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4288> It's worth noting how mapDumpIterator be not null, and disableMapAgg be true (I think this is the case when you auto-disable in-map aggregation having processed some of the input, correct?) It might be more readable to just handle the disableMapAgg case at the very top, and not have to check it afterwards: if (disableMapAgg) { if (mapDumpIterator != null) { // deal with iterator } else { return processInput(); } trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4290> extra newline trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4291> extra newlines trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4292> return POStatus.STATUS_EOP would be clearer and not require a comment trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4293> this should be a separate function trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4294> what's at valueTuple[0]? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4296> catch a class cast exception and rethrow with a meaningful error? ("Intermediate Algebraic functions must implement EvalFunc<Tuple>") . trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4297> newline trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4298> is ignoring nulls really the right thing to do? what if my UDF is "COUNT_NULLS"? Do we need a magical NULL object for this, or does this not work elsewhere already and we don't need to worry about supporting it? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4299> shouldn't this be output.returnStatus? trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4300> extra newline trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4302> kinda crammed in here :) if (aggMap.size() >= maxHashMapSize) { aaah room to breathe trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4301> this is quite trivial using LinkedHashMap initialized with the three-arg constructor; a one-line change. Worth a benchmark run. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4303> } else { trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4304> Hortonworks must buy you guys really large screens, I am jealous. I'll stop mentioning the newlines now, this is probably starting to annoy you :-). trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4305> get().getInt("pig.exec.mapPartAgg.minReducton", 0); please make the property public static or something else that's referenceable / discoverable trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4306> default should be a constant at the top of the file trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java <https://reviews.apache.org/r/1817/#comment4307> catch the classcastexception, rethrow with something more meaningful about Inermediate EvalFuncs being required to be EvalFunc<Tuple> trunk/src/org/apache/pig/data/SelfSpillBag.java <https://reviews.apache.org/r/1817/#comment4310> nice refactor trunk/src/org/apache/pig/data/SelfSpillBag.java <https://reviews.apache.org/r/1817/#comment4311> pig.cachedbag.memusage should be public static final trunk/test/org/apache/pig/test/TestPOPartialAgg.java <https://reviews.apache.org/r/1817/#comment4314> public static final reference... also, use setInt trunk/test/org/apache/pig/test/TestPOPartialAggPlan.java <https://reviews.apache.org/r/1817/#comment4312> this should be a public static final from the main body of code that you are referencing here setBoolean trunk/test/org/apache/pig/test/TestPOPartialAggPlan.java <https://reviews.apache.org/r/1817/#comment4313> p-s-f - Dmitriy On 2011-09-12 23:55:12, Thejas Nair wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/1817/ > ----------------------------------------------------------- > > (Updated 2011-09-12 23:55:12) > > > Review request for pig, Daniel Dai and Dmitriy Ryaboy. > > > Summary > ------- > > See PIG-2228 > > > This addresses bug PIG-2228. > https://issues.apache.org/jira/browse/PIG-2228 > > > Diffs > ----- > > trunk/src/org/apache/pig/Algebraic.java 1164722 > trunk/src/org/apache/pig/Main.java 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java > 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java > 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java > 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java > 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java > 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java > 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java > 1164722 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java > PRE-CREATION > trunk/src/org/apache/pig/data/DefaultTuple.java 1164722 > trunk/src/org/apache/pig/data/InternalCachedBag.java 1164722 > trunk/src/org/apache/pig/data/InternalDistinctBag.java 1164722 > trunk/src/org/apache/pig/data/InternalSortedBag.java 1164722 > trunk/src/org/apache/pig/data/SelfSpillBag.java PRE-CREATION > trunk/src/org/apache/pig/data/SizeUtil.java PRE-CREATION > trunk/src/org/apache/pig/data/SortedSpillBag.java 1164722 > trunk/test/e2e/pig/tests/nightly.conf 1164722 > trunk/test/org/apache/pig/test/TestDataBag.java 1164722 > trunk/test/org/apache/pig/test/TestPOPartialAgg.java PRE-CREATION > trunk/test/org/apache/pig/test/TestPOPartialAggPlan.java PRE-CREATION > trunk/test/org/apache/pig/test/Util.java 1164722 > trunk/test/org/apache/pig/test/utils/GenPhyOp.java 1164722 > > Diff: https://reviews.apache.org/r/1817/diff > > > Testing > ------- > > test-patch > [exec] -1 overall. > [exec] > [exec] +1 @author. The patch does not contain any @author tags. > [exec] > [exec] +1 tests included. The patch appears to include 21 new or > modified tests. > [exec] > [exec] +1 javadoc. The javadoc tool did not generate any warning > messages. > [exec] > [exec] +1 javac. The applied patch does not increase the total > number of javac compiler warnings. > [exec] > [exec] +1 findbugs. The patch does not introduce any new Findbugs > warnings. > [exec] > [exec] -1 release audit. The applied patch generated 461 release > audit warnings (more than the trunk's current 455 warnings). > release audit failures are because of jdiff changes > > All unit tests pass, new e2e tests added . > > > Thanks, > > Thejas > >
