my cluster consists of 4 nodes : 1 namenode and 3 datanodes, it works well
functioning as hdfs,but when I run mapreduce tasks, it will take quite a long
time and there're quite a lot of too many fetch-failures. I've checked the log
on the datanode and copy part of them as follows:
2010-08-18 14:28:33,142 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
child with bad map output: attempt_201008171837_0007_m_000006_1. Ignored.
2010-08-18 14:28:33,143 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 127.0.0.1:50060, dest: 127.0.0.1:54245, bytes: 0, op: MAPRED_SHUFFLE,
cliID: attempt_201008171837_0007_m_000006_1
2010-08-18 14:28:33,143 WARN org.mortbay.log: /mapOutput:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
in any of the configured local directories
2010-08-18 14:28:34,766 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:37,675 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:40,775 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:43,683 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:43,779 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:46,687 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:49,787 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:52,696 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:55,796 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:58,704 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:28:58,800 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:29:01,710 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:29:04,808 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
0.00 MB/s) >
2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput(attempt_201008171837_0007_m_000006_1,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
in any of the configured local directories
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
child with bad map output: attempt_201008171837_0007_m_000006_1. Ignored.
2010-08-18 14:29:05,259 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 127.0.0.1:50060, dest: 127.0.0.1:54288, bytes: 0, op: MAPRED_SHUFFLE,
cliID: attempt_201008171837_0007_m_000006_1
2010-08-18 14:29:05,259 WARN org.mortbay.log: /mapOutput:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
in any of the configured local directories
Almost all datanode behave the same way, seems reduce can't get the map result
from other datanode and I also looked at the charts from job Administrator, the
copy process did last quite a long time. Can anybody give me some
explanation,and the following of my configuration of hadoop-0.20.2:
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://vm153:9000</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>20</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>300</value>
<description>The number of seconds between two periodic checkpoints.
</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://vm153:9000</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>20</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>300</value>
<description>The number of seconds between two periodic checkpoints.
</description>
</property>
</configuration>
[shan...@vm153 conf]$ more hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/shangan/bin/hadoop-0.20.2/conf/exclude</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>vm153:9001</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>20</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>5</value>
</property>
</configuration>
WHAT'S THE PROBLEM ?Do I need to configure other parameters, there're
parameters like dfs.secondary.http.address,dfs.datanode.address, the ip of
which is 0.0.0.0,do I need to change them ?
2010-08-18
shangan