> On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote: > > Patch looks good. Just few minor review comments. > > > > > Note that 3 test cases in TestAccumulator are annotated as @Ignore > > > because SecondaryKeyOptimizer in Tez is not implement yet. The test cases > > > expect accumulator optimizer is applied when order-by and distinct are > > > present in a nested foreash because sort operator is removed by > > > SecondaryKeyOptimizer. Added TODO comments accordingly. > > > > I have the SecondaryKeyOptimizer fixed in my patch for PIG-3626. Will > > take care of reverting them back in that patch. > > > > > >
That will be awesome! > On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote: > > test/org/apache/pig/test/TestAccumulator.java, line 114 > > <https://reviews.apache.org/r/16463/diff/1/?file=403084#file403084line114> > > > > Why do we specifically need homeDirOnDfs? Isn't that the default > > current directory? I wanted to ask you about this. If I don't specify the home dir, getMkDirCommandForHadoop2_0() returns "" because parentDir is empty. > On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote: > > src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java, > > lines 148-154 > > <https://reviews.apache.org/r/16463/diff/1/?file=403078#file403078line148> > > > > I see that unsafe typecast to Iterable<NullableTuple> is not possible > > in java. > > > > But copying over to a new arraylist seems to be inefficient just to get > > a Iterable<NullableTuple>. Can we write a wrapper iterator that implements > > Iterable<NullableTuple> wrapping Iterable<Object> instead? Actually, that for loop does more than converting type. It also merges multiple lists into a single list. See the following code snippet- List<NullableTuple> nTups = new ArrayList<NullableTuple>(); // nTups is outside of the loop. for (int i = 0; i < numInputs; i++) { cur = readers.get(i).getCurrentKey(); if (comparator.compare(min, cur) == 0) { Iterable<Object> vals = readers.get(i).getCurrentValues(); for (Object val : vals) { nTups.add((NullableTuple) val); // adding vals from multiple inputs to the same list. } buffer.setKey(cur); buffer.setIterator(nTups.iterator()); // reset the iterator each time. bag = new AccumulativeBag(buffer, i); } So if I wrap the iterator returned by readers.get(i).getCurrentValues(), it won't work. I am open to any suggestion though. > On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote: > > test/org/apache/pig/test/TestCombiner.java, line 58 > > <https://reviews.apache.org/r/16463/diff/1/?file=403085#file403085line58> > > > > We seem to be doing this in a couple of places now. If there are issues > > we hit because of session reuse (static variables, etc), then we should try > > fix them so that session reuse works with grunt shell. Not important for > > this jira as we need to get the core stuff out first. Can you create a > > separate jira mentioning the issues that you faced so far with session > > reuse so that we come back later and fix them some time. Thank you for asking this. After PIG-3602, I still occasionally see a dangling process after TestCombiner. I don't see it always, so I guess timing matters. I will definitely follow up with a jira. > On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote: > > test/org/apache/pig/test/TestAccumulator.java, line 189 > > <https://reviews.apache.org/r/16463/diff/1/?file=403084#file403084line189> > > > > Any reason for adding this newly within tests? tearDown() already has > > it. I added this because in the same test case, if pigServer invokes openIterator() multiple times, Pig generates temporary files with the same name in subsequent calls. So I am cleaning them up after each invocation. Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2002: Output Location Validation Failed for: 'hdfs://localhost:56901/tmp/temp-1456742965/tmp-1456742965 More info to follow: 8788 Output directory hdfs://localhost:56901/tmp/temp-1456742965/tmp-1456742965 already exists I don't see this behavior in hadoopversion=20 but in hadoopversion=23. Perhaps we should fix FileLocalizer#getTemporaryPath(). I can take a closer look. > On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote: > > test/org/apache/pig/test/TestAccumulator.java, line 530 > > <https://reviews.apache.org/r/16463/diff/1/?file=403084#file403084line530> > > > > Can be removed. tearDown already does it Same as above. pigServer invokes openIterator() inside a while loop, so I am cleaning up temporary files after each iteration. - Cheolsoo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16463/#review30867 ----------------------------------------------------------- On Dec. 26, 2013, 6:11 a.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16463/ > ----------------------------------------------------------- > > (Updated Dec. 26, 2013, 6:11 a.m.) > > > Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3636 > https://issues.apache.org/jira/browse/PIG-3636 > > > Repository: pig-git > > > Description > ------- > > The patch implements accumulator optimization in Tez. The changes include- > * Create AccumulatorOptimizer in Tez. > * Create AccumulatorOptimizerUtil class and factor out common functions in MR > and Tez. > * Implement accumulator logic in POShuffleTezLoad. > * Update TestAccumulator to make it run in Tez mode. > > > Diffs > ----- > > src/org/apache/pig/PigConfiguration.java 0a26e8c > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/AccumulatorOptimizer.java > 7f9e15a > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java > 9eed25c > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java > 6e04513 > > src/org/apache/pig/backend/hadoop/executionengine/tez/AccumulatorOptimizer.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java > 722b9f6 > src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java > 742a33a > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > d42ce89 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java > c6af682 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java > e33a7c6 > > src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java > e69de29 > test/org/apache/pig/test/TestAccumulator.java b979649 > test/org/apache/pig/test/TestCombiner.java a227d18 > test/tez-tests fcb573e > > Diff: https://reviews.apache.org/r/16463/diff/ > > > Testing > ------- > > * TestAccumulator passes in Tez mode. > * All unit tests pass. > * All e2e tests pass. > > Note that 3 test cases in TestAccumulator are annotated as @Ignore because > SecondaryKeyOptimizer in Tez is not implement yet. The test cases expect > accumulator optimizer is applied when order-by and distinct are present in a > nested foreash because sort operator is removed by SecondaryKeyOptimizer. > Added TODO comments accordingly. > > > Thanks, > > Cheolsoo Park > >