"No space left on device" and "Could not find any valid local directory for taskTracker/jobcache/"

Virajith Jalaparti Thu, 23 Jun 2011 07:09:56 -0700

Hi,

I am trying to run a sort job (from hadoop-0.20.2-examples.jar) on 50GB of
data (generated using randomwriter). I am using hadoop-0.20.2 on a cluster
of 3 machines with one machine serving as the master and the other two as
slaves.
I get the following errors for various the task attempts:
=======================================================================
11/06/23 07:57:14 INFO mapred.JobClient: Task Id :
attempt_201106230747_0001_m_000119_0, Status : FAILED
Error: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:282)
        at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
        at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1298)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)


Error initializing attempt_201106230747_0001_m_000119_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for taskTracker/jobcache/job_201106230747_0001/job.xml
        at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
        at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750)
        at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664)
        at
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
        at
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629)
=======================================================================

 The dfsadmin -report gives me the following:

==================================================================
Configured Capacity: 465230045184 (433.28 GB)
Present Capacity: 440799092736 (410.53 GB)
DFS Remaining: 371988148224 (346.44 GB)
DFS Used: 68810944512 (64.09 GB)
DFS Used%: 15.61%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Name: 10.1.1.4:50010
Decommission Status : Normal
Configured Capacity: 232615022592 (216.64 GB)
DFS Used: 32243871744 (30.03 GB)
Non DFS Used: 12215377920 (11.38 GB)
DFS Remaining: 188155772928(175.23 GB)
DFS Used%: 13.86%
DFS Remaining%: 80.89%
Last contact: Thu Jun 23 08:04:51 MDT 2011


Name: 10.1.1.3:50010
Decommission Status : Normal
Configured Capacity: 232615022592 (216.64 GB)
DFS Used: 36567072768 (34.06 GB)
Non DFS Used: 12215574528 (11.38 GB)
DFS Remaining: 183832375296(171.21 GB)
DFS Used%: 15.72%
DFS Remaining%: 79.03%
Last contact: Thu Jun 23 08:04:51 MDT 2011

==================================================================



I have the following parameters configured in core-site.xml and
mapred-site.xml

*core-site.xml:*
<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/local/mapred/</value>
</property>
</configuration>

*mapred-site.xml:*
    <name>mapred.system.dir</name>
    <value>/mnt/local/mapred/system</value>
  </property>

  <property>
    <name>mapred.local.dir</name>
    <value>/mnt/local/mapred/local</value>
  </property>

  <property>
    <name>mapred.temp.dir</name>
    <value>/mnt/local/mapred/temp</value>
  </property>

/mnt/ is on a local disk at each node in my cluster and it is just 17% full
with a total disk capacity of around 220GB. Each of the above directories
are created with read/write permissions.


I dont see why I am getting the "No space left on device" error from these
configurations. Any ideas how to solve this problem?

Thanks,
Virajith

"No space left on device" and "Could not find any valid local directory for taskTracker/jobcache/"

Reply via email to