[
https://issues.apache.org/jira/browse/MAPREDUCE-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philip Zeyliger resolved MAPREDUCE-6992.
----------------------------------------
Resolution: Duplicate
I agree; this is a dupe. Thanks!
> Race for temp dir in LocalDistributedCacheManager.java
> ------------------------------------------------------
>
> Key: MAPREDUCE-6992
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6992
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Philip Zeyliger
>
> When localizing distributed cache files in "local" mode,
> LocalDistributedCacheManager.java chooses a "unique" directory based on a
> millisecond time stamp. When running code with some parallelism, it's
> possible to run into this.
> The error message looks like
> {code}
> bq. java.io.FileNotFoundException: jenkins/mapred/local/1508958341829_tmp
> does not exist
> {code}
> I ran into this in Impala's data loading. There, we run a HiveServer2 which
> runs in MapReduce. If multiple queries are submitted simultaneously to the
> HS2, they conflict on this directory. Googling found that StreamSets ran into
> something very similar looking at
> https://issues.streamsets.com/browse/SDC-5473.
> I believe the buggy code is (link:
> https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L94)
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
> new AtomicLong(System.currentTimeMillis());
> {code}
> Notably, a similar code path uses an actual random number generator
> ({{LocalJobRunner.java}},
> https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java#L912).
> {code}
> public String getStagingAreaDir() throws IOException {
> Path stagingRootDir = new Path(conf.get(JTConfig.JT_STAGING_AREA_ROOT,
> "/tmp/hadoop/mapred/staging"));
> UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
> String user;
> randid = rand.nextInt(Integer.MAX_VALUE);
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]