[
https://issues.apache.org/jira/browse/HADOOP-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385574#comment-14385574
]
zhihai xu commented on HADOOP-10924:
------------------------------------
[~wattsinabox], thanks for working on this issue. Either jobID_UUID or
jobID_Timestamp are ok for me.
I find out why TestMRWithDistributedCache will fail for you with JobId on its
own.
I can reproduce the failure with the following error message:
{code}
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.apache.hadoop.mapred.TestMRWithDistributedCache
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.555 sec <<<
FAILURE! - in org.apache.hadoop.mapred.TestMRWithDistributedCache
testLocalJobRunner(org.apache.hadoop.mapred.TestMRWithDistributedCache) Time
elapsed: 0.866 sec <<< ERROR!
java.io.IOException: java.util.concurrent.ExecutionException:
java.io.FileNotFoundException: File
file:/Users/zxu/upstream/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/build/test/mapred/local/job_local65835876_0001_tmp
does not exist
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at
org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:150)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:164)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:759)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:245)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at
org.apache.hadoop.mapred.TestMRWithDistributedCache.testWithConf(TestMRWithDistributedCache.java:186)
at
org.apache.hadoop.mapred.TestMRWithDistributedCache.testLocalJobRunner(TestMRWithDistributedCache.java:198)
Caused by: java.io.FileNotFoundException: File
file:/Users/zxu/upstream/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/build/test/mapred/local/job_local65835876_0001_tmp
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:612)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:821)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:602)
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileLinkStatusInternal(RawLocalFileSystem.java:837)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:823)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatus(RawLocalFileSystem.java:794)
at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:1267)
at
org.apache.hadoop.fs.DelegateToFileSystem.renameInternal(DelegateToFileSystem.java:182)
at
org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:748)
at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:236)
at org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678)
at org.apache.hadoop.fs.FileContext.rename(FileContext.java:948)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
It is because multiple DistributedCache resources use the same destination file
name during download,
It causes file names collision across multiple FSDownloads.
If I use jobId.toString() + '_' +
Long.toString(uniqueNumberGenerator.incrementAndGet()), this error won’t happen.
As [~ozawa] suggests, you can create a test case using a thread pool to start
many MR jobs, each MR jobs can be similar as the test case
TestMRWithDistributedCache#testLocalJobRunner. You can do something similar as
the following:
{code}
ExecutorService exec = null;
try {
ThreadFactory tf = new ThreadFactoryBuilder()
.setNameFormat("testLocalJobRunner #%d")
.build();
exec = Executors.newCachedThreadPool(tf);
List<Future<Integer>> list = new List<Future<Integer>>();
for (i=0; i < 10; i++) {
Callable<Integer> test = new TestCallable(i);
Future<Integer> future = exec.submit(test);
list.add(future);
}
for(Future<Integer> f: list) {
try {
int result = f.get();
} catch (InterruptedException e) {
} catch (ExecutionException e) {
Assert.fail(".....");
}
}
} finally {
if (exec != null) {
exec.shutdown();
}
}
public class TestCallable implements Callable<Integer> {
int index;
public TestCallable(int i) { index = i}
@Override
public Integer call() throws Exception {
Configuration c = new Configuration();
c.set(JTConfig.JT_IPC_ADDRESS, "local");
c.set("fs.defaultFS", "file:///");
testWithConf(c, index);
return index;
}
}
{code}
> LocalDistributedCacheManager for concurrent sqoop processes fails to create
> unique directories
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-10924
> URL: https://issues.apache.org/jira/browse/HADOOP-10924
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: William Watson
> Assignee: William Watson
> Attachments: HADOOP-10924.02.patch,
> HADOOP-10924.03.jobid-plus-uuid.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool:
> Encountered IOException running import job: java.io.IOException:
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot
> overwrite non empty destination directory
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400: INFO - at
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400: INFO - at
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400: INFO - at
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class:
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
> new AtomicLong(System.currentTimeMillis());
> {code}
> and
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)