I'm sorry to make some clerical mistakes in the question. I copied the core-site.xml configuration to hdfs-site.xml in the statement,actually I of course rsync all slaves with master. Thank you for you help. btw, could it be other reasons you know like two many ip-host mapping pairs in /etc/hosts of the same host ?
2010-08-18 shangan 发件人: xiujin yang 发送时间: 2010-08-18 17:35:30 收件人: [email protected] 抄送: 主题: RE: mapreduce doesn't work in my cluster Hi Shangan, I found a strange thing. That two hdfs-site.xml are different as you posted. Please check, do you rsync the slaves with master? hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://vm153:9000</value> </property> <property> <name>fs.trash.interval</name> <value>20</value> </property> <property> <name>fs.checkpoint.period</name> <value>300</value> <description>The number of seconds between two periodic checkpoints. </description> </property> </configuration> [shan...@vm153 conf]$ more hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/home/shangan/bin/hadoop-0.20.2/conf/exclude</value> </property> </configuration> Best, Xiujin Yang > From: [email protected] > Date: Wed, 18 Aug 2010 19:30:34 +1000 > Subject: Re: RE: mapreduce doesn't work in my cluster > To: [email protected] > > Please remove the locahost things, and you will probably be fine. > > Regards > Akash Deep Shakya "OpenAK" > University of New South Wales > akashakya at gmail dot com > > ~ Failure to prepare is preparing to fail ~ > > > > 2010/8/18 shangan <[email protected]> > > > 127.0.0.1 localhost.localdomain localhost > > ::1 localhost6.localdomain6 localhost6 > > 192.168.0.153 vm153 > > 192.168.0.148 vm148 > > 192.168.0.152 vm152 > > 192.168.0.154 vm154 > > > > the vm153,vm148,vm152,vm154 are the nodes I'm using in the cluster, as each > > node has some other ip-host mapping pairs for other use and I don't know > > whether it will affect. Looking forward your further help,I really > > appreciate it. > > > > > > 2010-08-18 > > > > > > > > shangan > > > > > > > > 发件人: xiujin yang > > 发送时间: 2010-08-18 17:08:27 > > 收件人: [email protected] > > 抄送: > > 主题: RE: mapreduce doesn't work in my cluster > > > > Hi Shangan, > > Please check your /etc/hosts, if all machines are setted. > > Best, > > Yang. > > > Date: Wed, 18 Aug 2010 15:01:46 +0800 > > > From: [email protected] > > > To: [email protected] > > > Subject: mapreduce doesn't work in my cluster > > > > > > my cluster consists of 4 nodes : 1 namenode and 3 datanodes, it works > > well functioning as hdfs,but when I run mapreduce tasks, it will take quite > > a long time and there're quite a lot of too many fetch-failures. I've > > checked the log on the datanode and copy part of them as follows: > > > > > > > > > > > > 2010-08-18 14:28:33,142 WARN org.apache.hadoop.mapred.TaskTracker: > > Unknown child with bad map output: attempt_201008171837_0007_m_000006_1. > > Ignored. > > > 2010-08-18 14:28:33,143 INFO > > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, > > dest: 127.0.0.1:54245, bytes: 0, op: MAPRED_SHUFFLE, cliID: > > attempt_201008171837_0007_m_000006_1 > > > 2010-08-18 14:28:33,143 WARN org.mortbay.log: /mapOutput: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index > > in any of the configured local directories > > > 2010-08-18 14:28:34,766 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:37,675 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:40,775 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:43,683 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:43,779 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:46,687 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:49,787 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:52,696 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:55,796 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:58,704 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:28:58,800 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:29:01,710 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:29:04,808 INFO org.apache.hadoop.mapred.TaskTracker: > > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at > > 0.00 MB/s) > > > > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker: > > getMapOutput(attempt_201008171837_0007_m_000006_1,0) failed : > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index > > in any of the configured local directories > > > at > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389) > > > at > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) > > > at > > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887) > > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > > > at > > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > > > at > > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > > > at > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > > at > > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > > > at > > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > > > at > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > > > at > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > > > at > > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > > > at org.mortbay.jetty.Server.handle(Server.java:324) > > > at > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > > > at > > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > > > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > > > at > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > > > at > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > > > at > > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > > > at > > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > > > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker: > > Unknown child with bad map output: attempt_201008171837_0007_m_000006_1. > > Ignored. > > > 2010-08-18 14:29:05,259 INFO > > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, > > dest: 127.0.0.1:54288, bytes: 0, op: MAPRED_SHUFFLE, cliID: > > attempt_201008171837_0007_m_000006_1 > > > 2010-08-18 14:29:05,259 WARN org.mortbay.log: /mapOutput: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index > > in any of the configured local directories > > > > > > > > > > > > Almost all datanode behave the same way, seems reduce can't get the map > > result from other datanode and I also looked at the charts from job > > Administrator, the copy process did last quite a long time. Can anybody give > > me some explanation,and the following of my configuration of hadoop-0.20.2: > > > > > > core-site.xml > > > > > > <?xml version="1.0"?> > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > > <!-- Put site-specific property overrides in this file. --> > > > <configuration> > > > <property> > > > <name>fs.default.name</name> > > > <value>hdfs://vm153:9000</value> > > > </property> > > > <property> > > > <name>fs.trash.interval</name> > > > <value>20</value> > > > </property> > > > <property> > > > <name>fs.checkpoint.period</name> > > > <value>300</value> > > > <description>The number of seconds between two periodic checkpoints. > > > </description> > > > </property> > > > </configuration> > > > > > > hdfs-site.xml > > > > > > <?xml version="1.0"?> > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > > <!-- Put site-specific property overrides in this file. --> > > > <configuration> > > > <property> > > > <name>fs.default.name</name> > > > <value>hdfs://vm153:9000</value> > > > </property> > > > <property> > > > <name>fs.trash.interval</name> > > > <value>20</value> > > > </property> > > > <property> > > > <name>fs.checkpoint.period</name> > > > <value>300</value> > > > <description>The number of seconds between two periodic checkpoints. > > > </description> > > > </property> > > > </configuration> > > > [shan...@vm153 conf]$ more hdfs-site.xml > > > <?xml version="1.0"?> > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > > <!-- Put site-specific property overrides in this file. --> > > > <configuration> > > > <property> > > > <name>dfs.replication</name> > > > <value>2</value> > > > </property> > > > <property> > > > <name>dfs.hosts.exclude</name> > > > <value>/home/shangan/bin/hadoop-0.20.2/conf/exclude</value> > > > </property> > > > </configuration> > > > > > > mapred-site.xml > > > > > > <?xml version="1.0"?> > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > > <!-- Put site-specific property overrides in this file. --> > > > <configuration> > > > <property> > > > <name>mapred.job.tracker</name> > > > <value>vm153:9001</value> > > > </property> > > > <property> > > > <name>mapred.map.tasks</name> > > > <value>20</value> > > > </property> > > > <property> > > > <name>mapred.reduce.tasks</name> > > > <value>5</value> > > > </property> > > > </configuration> > > > > > > WHAT'S THE PROBLEM ?Do I need to configure other parameters, there're > > parameters like dfs.secondary.http.address,dfs.datanode.address, the ip of > > which is 0.0.0.0,do I need to change them ? > > > > > > 2010-08-18 > > > > > > > > > > > > shangan > > > > __________ Information from ESET NOD32 Antivirus, version of virus > > signature database 5345 (20100805) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5345 (20100805) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
