Ray Chiang created MAPREDUCE-6685:
-------------------------------------
Summary: LocalDistributedCacheManager can have overlapping
filenames
Key: MAPREDUCE-6685
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6685
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ray Chiang
Assignee: Ray Chiang
LocalDistributedCacheManager has this setup:
bq. AtomicLong uniqueNumberGenerator = new
AtomicLong(System.currentTimeMillis());
to create this temporary filename:
bq. new FSDownload(localFSFileContext, ugi, conf, new Path(destPath,
Long.toString(uniqueNumberGenerator.incrementAndGet())), resource);
when using LocalJobRunner. When two or more start on the same machine, then
it's possible to end up having the same timestamp or a large enough overlap
that two successive timestamps may not be sufficiently far apart.
Given the assumptions:
1) Assume timestamp is the same. Then the most common starting random seed will
be the same.
2) Process ID will very likely be unique, but will likely be close in value.
3) Thread ID is not guaranteed to be unique.
A unique ID based on PID as a seed (in addition to the timestamp) should be a
better unique identifier for temporary filenames.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)