Re: Does hadoop local mode support running multiple jobs in different threads?

Bryan Keller Fri, 19 Aug 2011 16:28:40 -0700

I looked into this a bit more. It seems the LocalJobTracker by default uses a 
single local map reduce directory (named "localRunner") for all jobs. Thus two 
simultaneous jobs will step on each other.


You can work around this however. Set the "mapped.local.dir" property on each 
of the job's configuration to something unique, like a UUID. The 
LocalJobTracker will then use this instead of the default "localRunner" 
directory, and the jobs shouldn't step on each other.


On Jul 1, 2011, at 5:42 AM, Yaozhen Pan wrote:

> Hi,
> 
> I am not sure if this question (as title) has been asked before, but I
> didn't find an answer by googling.
> 
> I'd like to explain the scenario of my problem:
> My program launches several threads in the same time, while each thread will
> submit a hadoop job and wait for the job to complete.
> The unit tests were run in local mode, mini-cluster and the real hadoop
> cluster.
> I found the unit tests may fail in local mode, but they always succeeded in
> mini-cluster and real hadoop cluster.
> When unit test failed in local mode, the causes may be different (stack
> traces are posted at the end of mail).
> 
> It seems multi-thead running multiple jobs is not supported in local mode,
> is it?
> 
> Error 1:
> 2011-07-01 20:24:36,460 WARN  [Thread-38] mapred.LocalJobRunner
> (LocalJobRunner.java:run(256)) - job_local_0001
> java.io.FileNotFoundException: File
> build/test/tmp/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
> does not exist.
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:253)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1447)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 
> Error 2:
> 2011-07-01 19:00:25,546 INFO  [Thread-32] fs.FSInputChecker
> (FSInputChecker.java:readChecksumChunk(247)) - Found checksum error: b[3584,
> 4096]=696f6e69643c2f6e616d653e3c76616c75653e47302e4120636f696e636964656e63652047312e413c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6d61707265642e6a6f622e747261636b65722e706572736973742e6a6f627374617475732e6469723c2f6e616d653e3c76616c75653e2f6a6f62747261636b65722f6a6f6273496e666f3c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6d61707265642e6a61723c2f6e616d653e3c76616c75653e66696c653a2f686f6d652f70616e797a682f6861646f6f7063616c632f6275696c642f746573742f746d702f6d61707265642f73797374656d2f6a6f625f6c6f63616c5f303030332f6a6f622e6a61723c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e66732e73332e6275666665722e6469723c2f6e616d653e3c76616c75653e247b6861646f6f702e746d702e6469727d2f73333c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e6a6f622e656e642e72657472792e617474656d7074733c2f6e616d653e3c76616c75653e303c2f76616c75653e3c2f70726f70657274793e0a3c70726f70657274793e3c6e616d653e66732e66696c652e696d706c3c2f6e616d653e3c76616c75653e6f
> org.apache.hadoop.fs.ChecksumException: Checksum error:
> file:/home/hadoop-user/hadoop-proj/build/test/tmp/mapred/system/job_local_0003/job.xml
> at 3584
> at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
> at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
> at
> org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:61)
> at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1197)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:92)
> at
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
> at hadoop.GroupingRunnable.run(GroupingRunnable.java:126)
> at java.lang.Thread.run(Thread.java:619)

Re: Does hadoop local mode support running multiple jobs in different threads?

Reply via email to