Hi, I am trying to run a sort job (from hadoop-0.20.2-examples.jar) on 50GB of data (generated using randomwriter). I am using hadoop-0.20.2 on a cluster of 3 machines with one machine serving as the master and the other two as slaves. I get the following errors for various the task attempts: ======================================================================= 11/06/23 07:57:14 INFO mapred.JobClient: Task Id : attempt_201106230747_0001_m_000119_0, Status : FAILED Error: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
Error initializing attempt_201106230747_0001_m_000119_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201106230747_0001/job.xml at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629) ======================================================================= The dfsadmin -report gives me the following: ================================================================== Configured Capacity: 465230045184 (433.28 GB) Present Capacity: 440799092736 (410.53 GB) DFS Remaining: 371988148224 (346.44 GB) DFS Used: 68810944512 (64.09 GB) DFS Used%: 15.61% Under replicated blocks: 1 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 2 (2 total, 0 dead) Name: 10.1.1.4:50010 Decommission Status : Normal Configured Capacity: 232615022592 (216.64 GB) DFS Used: 32243871744 (30.03 GB) Non DFS Used: 12215377920 (11.38 GB) DFS Remaining: 188155772928(175.23 GB) DFS Used%: 13.86% DFS Remaining%: 80.89% Last contact: Thu Jun 23 08:04:51 MDT 2011 Name: 10.1.1.3:50010 Decommission Status : Normal Configured Capacity: 232615022592 (216.64 GB) DFS Used: 36567072768 (34.06 GB) Non DFS Used: 12215574528 (11.38 GB) DFS Remaining: 183832375296(171.21 GB) DFS Used%: 15.72% DFS Remaining%: 79.03% Last contact: Thu Jun 23 08:04:51 MDT 2011 ================================================================== I have the following parameters configured in core-site.xml and mapred-site.xml *core-site.xml:* <property> <name>hadoop.tmp.dir</name> <value>/mnt/local/mapred/</value> </property> </configuration> *mapred-site.xml:* <name>mapred.system.dir</name> <value>/mnt/local/mapred/system</value> </property> <property> <name>mapred.local.dir</name> <value>/mnt/local/mapred/local</value> </property> <property> <name>mapred.temp.dir</name> <value>/mnt/local/mapred/temp</value> </property> /mnt/ is on a local disk at each node in my cluster and it is just 17% full with a total disk capacity of around 220GB. Each of the above directories are created with read/write permissions. I dont see why I am getting the "No space left on device" error from these configurations. Any ideas how to solve this problem? Thanks, Virajith