Hello All,

I'm trying to run Pig e2e tests in parallel and there are many
failures like this in local mode:

WARN  org.apache.hadoop.mapred.Task - Could not find output size
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
output/file.out in any of the configured local directories
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
        at 
org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56)
        at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944)
        at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924)
        at org.apache.hadoop.mapred.Task.done(Task.java:875)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374)

It seems that the problem is in concurrent access to the JobTracker's
temporary directory - file.out is a temporary JobTracker's file. It's
clearly visible that different tests open files in the same directory:

$ lsof | grep output
java      20719    ikatsov   13r      REG                8,1      3486
  17039996 
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java      20719    ikatsov   16r      REG                8,1    349196
  17039986 
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out

$ lsof | grep output
java      25410    ikatsov   13w      REG                8,1      8145
  17039997 
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out

$ lsof | grep output
java       2223    ikatsov   13r      REG                8,1    289196
  16384629 
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0003/attempt_local_0003_r_000000_0/output/map_0.out

$ lsof | grep output
java      12187    ikatsov   14r      REG                8,1    349196
  17039996 
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out
java      12187    ikatsov   17r      REG                8,1    349196
  17039999 
/tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out


I wonder, is there way to specify temporary Hadoop directory
(mapreduce.cluster.local.dir) when launching Pig in local mode?

Thank you in advance,
Ilya

Reply via email to