> On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote: > > test/org/apache/pig/test/TestAccumulator.java, line 189 > > <https://reviews.apache.org/r/16463/diff/1/?file=403084#file403084line189> > > > > Any reason for adding this newly within tests? tearDown() already has > > it. > > Cheolsoo Park wrote: > I added this because in the same test case, if pigServer invokes > openIterator() multiple times, Pig generates temporary files with the same > name in subsequent calls. So I am cleaning them up after each invocation. > > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2002: Output > Location Validation Failed for: > 'hdfs://localhost:56901/tmp/temp-1456742965/tmp-1456742965 More info to > follow: > 8788 Output directory > hdfs://localhost:56901/tmp/temp-1456742965/tmp-1456742965 already exists > > I don't see this behavior in hadoopversion=20 but in hadoopversion=23. > Perhaps we should fix FileLocalizer#getTemporaryPath(). I can take a closer > look. > > Rohini Palaniswamy wrote: > I am sure this problem does not exist in trunk. We need to see why and > fix. Something must be broken in tez branch. > > Cheolsoo Park wrote: > Random doesn't guarantee unique numbers. We haven't seen this problem > because we were lucky. > > After changing Random.getNextInt() to UUID.randomUUID(), I no longer see > failures. > > Rohini Palaniswamy wrote: > Lot of MR golden files will have to be changed if you change the random > number generation logic. Also will lead to longer HDFS path names. How does > the same test work in MR? In fact saw that FileLocalizer.setR() is called > twice in MRCompiler resetting the static variable. Have not traced through. I > am sure we are missing something in Tez. > > Cheolsoo Park wrote: > >> Lot of MR golden files will have to be changed if you change the > random number generation logic. > > No. I am not changing MRCompiler and TestMRCompiler. > FileLocarlizer.setR() will be there only for testing purpose and backward > compatibility. > > >> How does the same test work in MR? In fact saw that > FileLocalizer.setR() is called twice in MRCompiler resetting the static > variable. Have not traced through. > > The same test passing in MR doesn't mean anything because it's > pseudo-random. In addition, we shouldn't try to workaround the problem. Using > Random to generate filenames is bad, and we should change it to UUID. > > > > > > Rohini Palaniswamy wrote: > Pseudo random means it starts repeating after quite some time and we > should never have the problem of a number repeating for a pig script when > r.nextInt() is called as the range is large. What you are seeing is that it > keeps repeating immediately. The problem is that in > MRCompiler.randomizeFileLocalizer we do a FileLocalizer.setR(new Random()); > which makes sure we have a totally random number next time. It also has a > > r = new Random(1331); > FileLocalizer.setR(r); > > just before FileLocalizer.setR(new Random()); is called and I am not sure > why that is there. In TezCompiler, we just do > > r = new Random(1331); > FileLocalizer.setR(r); > > and that is the problem. We always start with the same random number > again and again as the random generator is reset. Removing this should make > sure we don't start with the same random number again and again or we can do > a FileLocalizer.setR(new Random()); to start with a totally new random seed > again.
I think this discussion is becoming academic. Why we can't replace all this non-sense with UUID? Isn't it simpler and better? I just don't see a point to keep things unchanged in MR and Tez when there is no backward incompatibility. - Cheolsoo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16463/#review30867 ----------------------------------------------------------- On Dec. 26, 2013, 6:11 a.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16463/ > ----------------------------------------------------------- > > (Updated Dec. 26, 2013, 6:11 a.m.) > > > Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3636 > https://issues.apache.org/jira/browse/PIG-3636 > > > Repository: pig-git > > > Description > ------- > > The patch implements accumulator optimization in Tez. The changes include- > * Create AccumulatorOptimizer in Tez. > * Create AccumulatorOptimizerUtil class and factor out common functions in MR > and Tez. > * Implement accumulator logic in POShuffleTezLoad. > * Update TestAccumulator to make it run in Tez mode. > > > Diffs > ----- > > src/org/apache/pig/PigConfiguration.java 0a26e8c > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/AccumulatorOptimizer.java > 7f9e15a > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java > 9eed25c > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java > 6e04513 > > src/org/apache/pig/backend/hadoop/executionengine/tez/AccumulatorOptimizer.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java > 722b9f6 > src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java > 742a33a > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > d42ce89 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java > c6af682 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java > e33a7c6 > > src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java > e69de29 > test/org/apache/pig/test/TestAccumulator.java b979649 > test/org/apache/pig/test/TestCombiner.java a227d18 > test/tez-tests fcb573e > > Diff: https://reviews.apache.org/r/16463/diff/ > > > Testing > ------- > > * TestAccumulator passes in Tez mode. > * All unit tests pass. > * All e2e tests pass. > > Note that 3 test cases in TestAccumulator are annotated as @Ignore because > SecondaryKeyOptimizer in Tez is not implement yet. The test cases expect > accumulator optimizer is applied when order-by and distinct are present in a > nested foreash because sort operator is removed by SecondaryKeyOptimizer. > Added TODO comments accordingly. > > > Thanks, > > Cheolsoo Park > >
