So its not just at 16%, but depends on the task:
2008-10-30 13:58:29,702 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200810301345_0001_r_000000_0 0.25675678% reduce > copy (57 of
74 at 13.58 MB/s) >
2008-10-30 13:58:29,357 WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput(attempt_200810301345_0001_m_000048_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200810301345_0001/
attempt_200810301345_0001_m_000048_0/output/file.out.index in any of
the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator
$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
at
org
.apache
.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:
138)
at org.apache.hadoop.mapred.TaskTracker
$MapOutputServlet.doGet(TaskTracker.java:2402)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
I'm out of thoughts on what the problem could be..
On Oct 30, 2008, at 12:35 PM, Scott Whitecross wrote:
I'm growing very frustrated with a simple cluster setup. I can get
the cluster setup on two machines, but have troubles when trying to
extend the installation to 3 or more boxes. I keep seeing the below
errors. It seems the reduce tasks can't get access to the data.
I can't seem to figure out how to fix this error. What amazes me
is that file not found issues appear on the master box, as well as
the slaves. What causes the reduce tasks to not read find
information via the localhost?
Setup/Errors:
My basic setup comes from: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
(Michael Noll's setup). I've put the following in the my /etc/
hosts file:
127.0.0.1 localhost
10.1.1.12 master
10.1.1.10 slave
10.1.1.13 slave1
And have setup transparent ssh to all boxes (and it works). All
boxes can see each other, etc.
My base level hadoop-site.xml is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-datastore</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Errors:
WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput(attempt_200810301206_0004_m_000001_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
find taskTracker/jobcache/job_200810301206_0004/
attempt_200810301206_0004_m_000001_0/output/file.out.index in any of
the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator
$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
at
org
.apache
.hadoop
.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:
138)...
and in the userlog of the attempt:
2008-10-30 12:28:00,806 WARN org.apache.hadoop.mapred.ReduceTask:
java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_200810301206_0004&map=attempt_200810301206_0004_m_000001_0&reduce=0
at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown
Source)
at
sun
.reflect
.DelegatingConstructorAccessorImpl
.newInstance(DelegatingConstructorAccessorImpl.java:27)