> On Dec. 26, 2013, 4:45 p.m., Rohini Palaniswamy wrote:
> > test/org/apache/pig/test/TestAccumulator.java, line 189
> > <https://reviews.apache.org/r/16463/diff/1/?file=403084#file403084line189>
> >
> >     Any reason for adding this newly within tests? tearDown() already has 
> > it.
> 
> Cheolsoo Park wrote:
>     I added this because in the same test case, if pigServer invokes 
> openIterator() multiple times, Pig generates temporary files with the same 
> name in subsequent calls. So I am cleaning them up after each invocation.
>     
>     Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2002: Output 
> Location Validation Failed for: 
> 'hdfs://localhost:56901/tmp/temp-1456742965/tmp-1456742965 More info to 
> follow:
>     8788 Output directory 
> hdfs://localhost:56901/tmp/temp-1456742965/tmp-1456742965 already exists
>     
>     I don't see this behavior in hadoopversion=20 but in hadoopversion=23. 
> Perhaps we should fix FileLocalizer#getTemporaryPath(). I can take a closer 
> look.
> 
> Rohini Palaniswamy wrote:
>     I am sure this problem does not exist in trunk. We need to see why and 
> fix. Something must be broken in tez branch.
> 
> Cheolsoo Park wrote:
>     Random doesn't guarantee unique numbers. We haven't seen this problem 
> because we were lucky.
>     
>     After changing Random.getNextInt() to UUID.randomUUID(), I no longer see 
> failures.
> 
> Rohini Palaniswamy wrote:
>     Lot of MR golden files will have to be changed if you change the random 
> number generation logic. Also will lead to longer HDFS path names. How does 
> the same test work in MR? In fact saw that FileLocalizer.setR() is called 
> twice in MRCompiler resetting the static variable. Have not traced through. I 
> am sure we are missing something in Tez.
> 
> Cheolsoo Park wrote:
>     >> Lot of MR golden files will have to be changed if you change the 
> random number generation logic.
>     
>     No. I am not changing MRCompiler and TestMRCompiler. 
> FileLocarlizer.setR() will be there only for testing purpose and backward 
> compatibility. 
>     
>     >> How does the same test work in MR? In fact saw that 
> FileLocalizer.setR() is called twice in MRCompiler resetting the static 
> variable. Have not traced through.
>     
>     The same test passing in MR doesn't mean anything because it's 
> pseudo-random. In addition, we shouldn't try to workaround the problem. Using 
> Random to generate filenames is bad, and we should change it to UUID.
>     
>     
>     
>

Pseudo random means it starts repeating after quite some time and we should 
never have the problem of a number repeating for a pig script when r.nextInt() 
is called as the range is large. What you are seeing is that it keeps repeating 
immediately. The problem is that in MRCompiler.randomizeFileLocalizer we do a 
FileLocalizer.setR(new Random()); which makes sure we have a totally random 
number next time. It also has a

r = new Random(1331);
        FileLocalizer.setR(r); 

just before FileLocalizer.setR(new Random()); is called and I am not sure why 
that is there. In TezCompiler, we just do 

r = new Random(1331);
FileLocalizer.setR(r);

and that is the problem. We always start with the same random number again and 
again as the random generator is reset. Removing this should make sure we don't 
start with the same random number again and again or we can do a 
FileLocalizer.setR(new Random()); to start with a totally new random seed again.


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16463/#review30867
-----------------------------------------------------------


On Dec. 26, 2013, 6:11 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16463/
> -----------------------------------------------------------
> 
> (Updated Dec. 26, 2013, 6:11 a.m.)
> 
> 
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini 
> Palaniswamy.
> 
> 
> Bugs: PIG-3636
>     https://issues.apache.org/jira/browse/PIG-3636
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> The patch implements accumulator optimization in Tez. The changes include-
> * Create AccumulatorOptimizer in Tez.
> * Create AccumulatorOptimizerUtil class and factor out common functions in MR 
> and Tez.
> * Implement accumulator logic in POShuffleTezLoad.
> * Update TestAccumulator to make it run in Tez mode.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/PigConfiguration.java 0a26e8c 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/AccumulatorOptimizer.java
>  7f9e15a 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java
>  9eed25c 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java
>  6e04513 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/AccumulatorOptimizer.java
>  e69de29 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 
> 722b9f6 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java 
> 742a33a 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
> d42ce89 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 
> c6af682 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java 
> e33a7c6 
>   
> src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java
>  e69de29 
>   test/org/apache/pig/test/TestAccumulator.java b979649 
>   test/org/apache/pig/test/TestCombiner.java a227d18 
>   test/tez-tests fcb573e 
> 
> Diff: https://reviews.apache.org/r/16463/diff/
> 
> 
> Testing
> -------
> 
> * TestAccumulator passes in Tez mode.
> * All unit tests pass.
> * All e2e tests pass.
> 
> Note that 3 test cases in TestAccumulator are annotated as @Ignore because 
> SecondaryKeyOptimizer in Tez is not implement yet. The test cases expect 
> accumulator optimizer is applied when order-by and distinct are present in a 
> nested foreash because sort operator is removed by SecondaryKeyOptimizer. 
> Added TODO comments accordingly.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>

Reply via email to