Cheolsoo Park created PIG-3645:
----------------------------------
Summary: Replace Random with UUID in
FileLocalizer.getTemporaryPath()
Key: PIG-3645
URL: https://issues.apache.org/jira/browse/PIG-3645
Project: Pig
Issue Type: Improvement
Components: impl
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Minor
Fix For: 0.13.0
Currently, temporary paths are generated by FileLocalizer using
Random.nextInt(). To provide strong randomness, MapReduceLauncher resets the
Random object every time when compiling physical plan to MR plan:
{code}
MRCompiler comp = new MRCompiler(php, pc);
comp.randomizeFileLocalizer(); // This in turn calls FileLocalizer.setR(new
Random()).
{code}
Besides, there are a couple of places calling FileLocalizer.setR() (e.g.
MRCompiler) with some random seed.
I think-
# Randomizing Random seed is unnecessary if we switch to UUID.
# Setting Random objects in code like this is error-prone because it can be
easily broken by having or missing a FileLocalizer.setR() somewhere else. See
an example [here|http://search-hadoop.com/m/2nxTzQXfHw1].
So I propose that we remove all this "randomizing Random seed" code and use
UUID instead in temporary paths.
For unit tests that compare the results against gold files, we should still
allow to set Random seed through FileLocalizer.setR(). But this method will be
annotated as "VisibleForTesting" to ensure it is not used nowhere else other
than in unit tests.
Regarding the existing gold files, they can be easily regenerated by
TestMRCompiler as follows-
{code}
FileOutputStream fos = new FileOutputStream(expectedFile + "_new");
PrintWriter pw = new PrintWriter(fos);
pw.write(compiledPlan);
{code}
I assume there won't be any kind of regressions due to this change. But please
let me know if I am wrong.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)