Hello All, I'm trying to run Pig e2e tests in parallel and there are many failures like this in local mode:
WARN org.apache.hadoop.mapred.Task - Could not find output size org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/file.out in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:56) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:944) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:924) at org.apache.hadoop.mapred.Task.done(Task.java:875) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:374) It seems that the problem is in concurrent access to the JobTracker's temporary directory - file.out is a temporary JobTracker's file. It's clearly visible that different tests open files in the same directory: $ lsof | grep output java 20719 ikatsov 13r REG 8,1 3486 17039996 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out java 20719 ikatsov 16r REG 8,1 349196 17039986 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out $ lsof | grep output java 25410 ikatsov 13w REG 8,1 8145 17039997 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out $ lsof | grep output java 2223 ikatsov 13r REG 8,1 289196 16384629 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0003/attempt_local_0003_r_000000_0/output/map_0.out $ lsof | grep output java 12187 ikatsov 14r REG 8,1 349196 17039996 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_0.out java 12187 ikatsov 17r REG 8,1 349196 17039999 /tmp/hadoop-ikatsov/mapred/local/taskTracker/ikatsov/jobcache/job_local_0001/attempt_local_0001_r_000000_0/output/map_1.out I wonder, is there way to specify temporary Hadoop directory (mapreduce.cluster.local.dir) when launching Pig in local mode? Thank you in advance, Ilya