Hello, >From time to time I get the following error:
Error initializing attempt_201008101445_0212_r_000002_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201008101445_0212/job.xml at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629) If I restart the job without making any changes to the cluster or the disks it eventually works. After a while I get this error again for a different job. Restarting the job always seems to work but this is very annoying. I searched online and it seems that this error is triggered when there is not enough space on any of the disks. This is not the case for me as each node has 200GB of free space. Is there anything else I can check besides the free space on the disks? Thanks! Rares