Re: RE: mapreduce doesn't work in my cluster

shangan Wed, 18 Aug 2010 03:19:50 -0700

can I have extra ip-host mapping pairs in hosts,just take 192.168.0.148 for 
example,will this make the vm148 not work ?
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.0.153           vm153
192.168.0.148           vm148
192.168.0.152           vm152
192.168.0.154           vm154
192.168.0.148           s.com.cn
192.168.0.148           s1.com.cn


the hosts file must contain all nodes and no other records and must make all 
nodes have the same configuration ?


2010-08-18 



shangan 



发件人： xiujin yang 
发送时间： 2010-08-18  18:11:10 
收件人： [email protected] 
抄送： 
主题： RE: mapreduce doesn't work in my cluster 
 
Hi Shangan
I think the problem is casued by configration problem. 
1. Make sure both namenode & datanode has the same hosts configration. 
2. All *.xml should be checked. 
> > WHAT'S THE PROBLEM ?Do I need to configure other parameters, 
there're parameters like 
dfs.secondary.http.address,dfs.datanode.address, the ip of which is 
0.0.0.0,do I need to change them ?
No, default will be ok. 
Best,
Yang. 
> Date: Wed, 18 Aug 2010 17:16:42 +0800
> From: [email protected]
> To: [email protected]
> Subject: Re: RE: mapreduce doesn't work in my cluster
> 
> 127.0.0.1               localhost.localdomain localhost
> ::1             localhost6.localdomain6 localhost6
> 192.168.0.153           vm153
> 192.168.0.148           vm148
> 192.168.0.152           vm152
> 192.168.0.154           vm154
> 
> the vm153,vm148,vm152,vm154 are the nodes I'm using in the cluster, as each 
> node has some other ip-host mapping pairs for other use and I don't know 
> whether it will affect. Looking forward your further help,I really appreciate 
> it.
> 
> 
> 2010-08-18 
> 
> 
> 
> shangan 
> 
> 
> 
> 发件人： xiujin yang 
> 发送时间： 2010-08-18  17:08:27 
> 收件人： [email protected] 
> 抄送： 
> 主题： RE: mapreduce doesn't work in my cluster 
>  
> Hi Shangan,
> Please check your /etc/hosts, if all machines are setted. 
> Best,
> Yang.
> > Date: Wed, 18 Aug 2010 15:01:46 +0800
> > From: [email protected]
> > To: [email protected]
> > Subject: mapreduce doesn't work in my cluster
> > 
> > my cluster consists of 4 nodes : 1 namenode and 3 datanodes, it works well 
> > functioning as hdfs,but when I run mapreduce tasks, it will take quite a 
> > long time and there're quite a lot of too many fetch-failures. I've checked 
> > the log on the datanode and copy part of them as follows:
> > 
> > 
> > 
> > 2010-08-18 14:28:33,142 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
> > child with bad map output: attempt_201008171837_0007_m_000006_1. Ignored.
> > 2010-08-18 14:28:33,143 INFO 
> > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, 
> > dest: 127.0.0.1:54245, bytes: 0, op: MAPRED_SHUFFLE, cliID: 
> > attempt_201008171837_0007_m_000006_1
> > 2010-08-18 14:28:33,143 WARN org.mortbay.log: /mapOutput: 
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
> >  in any of the configured local directories
> > 2010-08-18 14:28:34,766 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:37,675 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:40,775 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:43,683 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:43,779 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:46,687 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:49,787 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:52,696 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:55,796 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:58,704 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:28:58,800 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:29:01,710 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:29:04,808 INFO org.apache.hadoop.mapred.TaskTracker: 
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at 
> > 0.00 MB/s) > 
> > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker: 
> > getMapOutput(attempt_201008171837_0007_m_000006_1,0) failed :
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
> >  in any of the configured local directories
> >         at 
> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
> >         at 
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
> >         at 
> > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >         at 
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> >         at 
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> >         at 
> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >         at 
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >         at 
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >         at 
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> >         at 
> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >         at 
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >         at org.mortbay.jetty.Server.handle(Server.java:324)
> >         at 
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> >         at 
> > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> >         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> >         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> >         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> >         at 
> > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> >         at 
> > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
> > child with bad map output: attempt_201008171837_0007_m_000006_1. Ignored.
> > 2010-08-18 14:29:05,259 INFO 
> > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, 
> > dest: 127.0.0.1:54288, bytes: 0, op: MAPRED_SHUFFLE, cliID: 
> > attempt_201008171837_0007_m_000006_1
> > 2010-08-18 14:29:05,259 WARN org.mortbay.log: /mapOutput: 
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
> >  in any of the configured local directories
> > 
> > 
> > 
> > Almost all datanode behave the same way, seems reduce can't get the map 
> > result from other datanode and I also looked at the charts from job 
> > Administrator, the copy process did last quite a long time. Can anybody 
> > give me some explanation,and the following of my configuration of 
> > hadoop-0.20.2:
> > 
> > core-site.xml
> > 
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > <!-- Put site-specific property overrides in this file. -->
> > <configuration>
> >    <property>
> >         <name>fs.default.name</name>
> >         <value>hdfs://vm153:9000</value>
> >    </property>
> >    <property>
> >         <name>fs.trash.interval</name>
> >         <value>20</value>
> >    </property>
> > <property>
> >   <name>fs.checkpoint.period</name>
> >   <value>300</value>
> >   <description>The number of seconds between two periodic checkpoints.
> >   </description>
> > </property>
> > </configuration>
> > 
> > hdfs-site.xml
> > 
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > <!-- Put site-specific property overrides in this file. -->
> > <configuration>
> >    <property>
> >         <name>fs.default.name</name>
> >         <value>hdfs://vm153:9000</value>
> >    </property>
> >    <property>
> >         <name>fs.trash.interval</name>
> >         <value>20</value>
> >    </property>
> > <property>
> >   <name>fs.checkpoint.period</name>
> >   <value>300</value>
> >   <description>The number of seconds between two periodic checkpoints.
> >   </description>
> > </property>
> > </configuration>
> > [shan...@vm153 conf]$ more hdfs-site.xml 
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > <!-- Put site-specific property overrides in this file. -->
> > <configuration>
> > <property>
> >   <name>dfs.replication</name>
> >   <value>2</value>
> > </property>
> > <property>
> >   <name>dfs.hosts.exclude</name>
> >   <value>/home/shangan/bin/hadoop-0.20.2/conf/exclude</value>
> > </property>
> > </configuration>
> > 
> > mapred-site.xml
> > 
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > <!-- Put site-specific property overrides in this file. -->
> > <configuration>
> >    <property>
> >         <name>mapred.job.tracker</name>
> >         <value>vm153:9001</value>
> >    </property>
> >    <property>
> >         <name>mapred.map.tasks</name>
> >         <value>20</value>
> >    </property>
> >    <property>
> >         <name>mapred.reduce.tasks</name>
> >         <value>5</value>
> >    </property>
> > </configuration>
> > 
> > WHAT'S THE PROBLEM ?Do I need to configure other parameters, there're 
> > parameters like dfs.secondary.http.address,dfs.datanode.address, the ip of 
> > which is 0.0.0.0,do I need to change them ?
> > 
> > 2010-08-18 
> > 
> > 
> > 
> > shangan 
>           
> __________ Information from ESET NOD32 Antivirus, version of virus signature 
> database 5345 (20100805) __________
> The message was checked by ESET NOD32 Antivirus.
> http://www.eset.com
          
__________ Information from ESET NOD32 Antivirus, version of virus signature 
database 5345 (20100805) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com

Re: RE: mapreduce doesn't work in my cluster

Reply via email to