[jira] [Commented] (PIG-3645) Replace Random with UUID in FileLocalizer.getTemporaryPath()

Rohini Palaniswamy (JIRA) Fri, 27 Dec 2013 12:12:22 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857691#comment-13857691
 ]


Rohini Palaniswamy commented on PIG-3645:
-----------------------------------------

This change is not required.  Few reasons:
  - new Random() gives enough randomness as it takes into account 
System.nanoTime() and UUID is not going to add any advantage. Infact UUID uses 
Random internally. Differences are it uses SecureRandom (which we don't need) 
and does more bytes (which also we don't need as integer range itself is huge 
and is not going to repeat within a execution of a pig script) and uses clock 
sequence (System.nanoTime() in Random is good enough for us). And Random has 
been working fine in production for 7+ years without collision between users 
and I don't see any reason to change that.  If UUID were to give some advantage 
or fixes a real issue, I would certainly go for it.
  - UUID returns takes up 36 chars, while random we are only using 10 chars. 
Would prefer shorter filenames to take up less space in NN and prevent GCs. 
  - Problem is not with randomness of Random but mix up in MRComplier code 
resetting the seed of Random to take care of tests.  What is been done in MR is 
that they set new Random(1331) in MRCompiler so tests get the same plan 
everytime. But they reset it to new Random() again when the script is actually 
run (in MapreduceLauncher) so that it does not write to the same file again and 
again. Simple change would be to move the new Random(1331) to TestMRCompiler 
and org.apache.pig.test.Util.buildMRPlan() as that should belong in the tests 
and not in the normal code path. If that is done, 
comp.randomizeFileLocalizer(); can be removed from MapreduceLauncher.

> Replace Random with UUID in FileLocalizer.getTemporaryPath()
> ------------------------------------------------------------
>
>                 Key: PIG-3645
>                 URL: https://issues.apache.org/jira/browse/PIG-3645
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.13.0
>
>         Attachments: PIG-3645-1.patch
>
>
> Currently, temporary paths are generated by FileLocalizer using 
> Random.nextInt(). To provide strong randomness, MapReduceLauncher resets the 
> Random object every time when compiling physical plan to MR plan:
> {code}
> MRCompiler comp = new MRCompiler(php, pc); 
> comp.randomizeFileLocalizer(); // This in turn calls FileLocalizer.setR(new 
> Random()).
> {code}
> Besides, there are a couple of places calling FileLocalizer.setR() (e.g. 
> MRCompiler) with some random seed.
> I think-
> # Randomizing Random seed is unnecessary if we switch to UUID.
> # Setting Random objects in code like this is error-prone because it can be 
> easily broken by having or missing a FileLocalizer.setR() somewhere else. See 
> an example [here|http://search-hadoop.com/m/2nxTzQXfHw1].
> So I propose that we remove all this "randomizing Random seed" code and use 
> UUID instead in temporary paths.
> For unit tests that compare the results against gold files, we should still 
> allow to set Random seed through FileLocalizer.setR(). But this method will 
> be annotated as "VisibleForTesting" to ensure it is not used nowhere else 
> other than in unit tests.
> Regarding the existing gold files, they can be easily regenerated by 
> TestMRCompiler as follows-
> {code}
> FileOutputStream fos = new FileOutputStream(expectedFile + "_new");
> PrintWriter pw = new PrintWriter(fos);
> pw.write(compiledPlan);
> {code}
> I assume there won't be any kind of regressions due to this change. But 
> please let me know if I am wrong.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PIG-3645) Replace Random with UUID in FileLocalizer.getTemporaryPath()

Reply via email to