I'm growing very frustrated with a simple cluster setup. I can get
the cluster setup on two machines, but have troubles when trying to
extend the installation to 3 or more boxes. I keep seeing the below
errors. It seems the reduce tasks can't get access to the data.
I can't seem to figure out how to fix this error. What amazes me is
that file not found issues appear on the master box, as well as the
slaves. What causes the reduce tasks to not read find information via
the localhost?
Setup/Errors:
My basic setup comes from: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
(Michael Noll's setup). I've put the following in the my /etc/hosts
file:
127.0.0.1 localhost
10.1.1.12 master
10.1.1.10 slave
10.1.1.13 slave1
And have setup transparent ssh to all boxes (and it works). All boxes
can see each other, etc.
My base level hadoop-site.xml is:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-datastore</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Errors:
WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput(attempt_200810301206_0004_m_000001_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200810301206_0004/
attempt_200810301206_0004_m_000001_0/output/file.out.index in any of
the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator
$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
at
org
.apache
.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:
138)...
and in the userlog of the attempt:
2008-10-30 12:28:00,806 WARN org.apache.hadoop.mapred.ReduceTask:
java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_200810301206_0004&map=attempt_200810301206_0004_m_000001_0&reduce=0
at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown
Source)
at
sun
.reflect
.DelegatingConstructorAccessorImpl
.newInstance(DelegatingConstructorAccessorImpl.java:27)