Re: RE: mapreduce doesn't work in my cluster

Harsh J Wed, 18 Aug 2010 07:12:56 -0700

2010/8/18 shangan <[email protected]>:
> can I have extra ip-host mapping pairs in hosts,just take 192.168.0.148 for 
> example,will this make the vm148 not work ?
It should be alright with that. But a better way would be to add it as
an alias to the same line, correct?
> 127.0.0.1               localhost.localdomain localhost
> ::1             localhost6.localdomain6 localhost6
> 192.168.0.153           vm153
> 192.168.0.148           vm148
> 192.168.0.152           vm152
> 192.168.0.154           vm154
> 192.168.0.148           s.com.cn
> 192.168.0.148           s1.com.cn
>
> the hosts file must contain all nodes and no other records and must make all 
> nodes have the same configuration ?
>
>
> 2010-08-18
>
>
>
> shangan
>
>
>
> 发件人： xiujin yang
> 发送时间： 2010-08-18  18:11:10
> 收件人： [email protected]
> 抄送：
> 主题： RE: mapreduce doesn't work in my cluster
>
> Hi Shangan
> I think the problem is casued by configration problem.
> 1. Make sure both namenode & datanode has the same hosts configration.
> 2. All *.xml should be checked.
>> > WHAT'S THE PROBLEM ?Do I need to configure other parameters,
> there're parameters like
> dfs.secondary.http.address,dfs.datanode.address, the ip of which is
> 0.0.0.0,do I need to change them ?
> No, default will be ok.
> Best,
> Yang.
>> Date: Wed, 18 Aug 2010 17:16:42 +0800
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: RE: mapreduce doesn't work in my cluster
>>
>> 127.0.0.1               localhost.localdomain localhost
>> ::1             localhost6.localdomain6 localhost6
>> 192.168.0.153           vm153
>> 192.168.0.148           vm148
>> 192.168.0.152           vm152
>> 192.168.0.154           vm154
>>
>> the vm153,vm148,vm152,vm154 are the nodes I'm using in the cluster, as each 
>> node has some other ip-host mapping pairs for other use and I don't know 
>> whether it will affect. Looking forward your further help,I really 
>> appreciate it.
>>
>>
>> 2010-08-18
>>
>>
>>
>> shangan
>>
>>
>>
>> 发件人： xiujin yang
>> 发送时间： 2010-08-18  17:08:27
>> 收件人： [email protected]
>> 抄送：
>> 主题： RE: mapreduce doesn't work in my cluster
>>
>> Hi Shangan,
>> Please check your /etc/hosts, if all machines are setted.
>> Best,
>> Yang.
>> > Date: Wed, 18 Aug 2010 15:01:46 +0800
>> > From: [email protected]
>> > To: [email protected]
>> > Subject: mapreduce doesn't work in my cluster
>> >
>> > my cluster consists of 4 nodes : 1 namenode and 3 datanodes, it works well 
>> > functioning as hdfs,but when I run mapreduce tasks, it will take quite a 
>> > long time and there're quite a lot of too many fetch-failures. I've 
>> > checked the log on the datanode and copy part of them as follows:
>> >
>> >
>> >
>> > 2010-08-18 14:28:33,142 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
>> > child with bad map output: attempt_201008171837_0007_m_000006_1. Ignored.
>> > 2010-08-18 14:28:33,143 INFO 
>> > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, 
>> > dest: 127.0.0.1:54245, bytes: 0, op: MAPRED_SHUFFLE, cliID: 
>> > attempt_201008171837_0007_m_000006_1
>> > 2010-08-18 14:28:33,143 WARN org.mortbay.log: /mapOutput: 
>> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
>> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
>> >  in any of the configured local directories
>> > 2010-08-18 14:28:34,766 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:37,675 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:40,775 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:43,683 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:43,779 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:46,687 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:49,787 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:52,696 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:55,796 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:58,704 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:28:58,800 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:29:01,710 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:29:04,808 INFO org.apache.hadoop.mapred.TaskTracker: 
>> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 
>> > at 0.00 MB/s) >
>> > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker: 
>> > getMapOutput(attempt_201008171837_0007_m_000006_1,0) failed :
>> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
>> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
>> >  in any of the configured local directories
>> >         at 
>> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
>> >         at 
>> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
>> >         at 
>> > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
>> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>> >         at 
>> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
>> >         at 
>> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
>> >         at 
>> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>> >         at 
>> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>> >         at 
>> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>> >         at 
>> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
>> >         at 
>> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> >         at 
>> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> >         at org.mortbay.jetty.Server.handle(Server.java:324)
>> >         at 
>> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
>> >         at 
>> > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
>> >         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
>> >         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
>> >         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
>> >         at 
>> > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>> >         at 
>> > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>> > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
>> > child with bad map output: attempt_201008171837_0007_m_000006_1. Ignored.
>> > 2010-08-18 14:29:05,259 INFO 
>> > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060, 
>> > dest: 127.0.0.1:54288, bytes: 0, op: MAPRED_SHUFFLE, cliID: 
>> > attempt_201008171837_0007_m_000006_1
>> > 2010-08-18 14:29:05,259 WARN org.mortbay.log: /mapOutput: 
>> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
>> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
>> >  in any of the configured local directories
>> >
>> >
>> >
>> > Almost all datanode behave the same way, seems reduce can't get the map 
>> > result from other datanode and I also looked at the charts from job 
>> > Administrator, the copy process did last quite a long time. Can anybody 
>> > give me some explanation,and the following of my configuration of 
>> > hadoop-0.20.2:
>> >
>> > core-site.xml
>> >
>> > <?xml version="1.0"?>
>> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> > <!-- Put site-specific property overrides in this file. -->
>> > <configuration>
>> >    <property>
>> >         <name>fs.default.name</name>
>> >         <value>hdfs://vm153:9000</value>
>> >    </property>
>> >    <property>
>> >         <name>fs.trash.interval</name>
>> >         <value>20</value>
>> >    </property>
>> > <property>
>> >   <name>fs.checkpoint.period</name>
>> >   <value>300</value>
>> >   <description>The number of seconds between two periodic checkpoints.
>> >   </description>
>> > </property>
>> > </configuration>
>> >
>> > hdfs-site.xml
>> >
>> > <?xml version="1.0"?>
>> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> > <!-- Put site-specific property overrides in this file. -->
>> > <configuration>
>> >    <property>
>> >         <name>fs.default.name</name>
>> >         <value>hdfs://vm153:9000</value>
>> >    </property>
>> >    <property>
>> >         <name>fs.trash.interval</name>
>> >         <value>20</value>
>> >    </property>
>> > <property>
>> >   <name>fs.checkpoint.period</name>
>> >   <value>300</value>
>> >   <description>The number of seconds between two periodic checkpoints.
>> >   </description>
>> > </property>
>> > </configuration>
>> > [shan...@vm153 conf]$ more hdfs-site.xml
>> > <?xml version="1.0"?>
>> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> > <!-- Put site-specific property overrides in this file. -->
>> > <configuration>
>> > <property>
>> >   <name>dfs.replication</name>
>> >   <value>2</value>
>> > </property>
>> > <property>
>> >   <name>dfs.hosts.exclude</name>
>> >   <value>/home/shangan/bin/hadoop-0.20.2/conf/exclude</value>
>> > </property>
>> > </configuration>
>> >
>> > mapred-site.xml
>> >
>> > <?xml version="1.0"?>
>> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> > <!-- Put site-specific property overrides in this file. -->
>> > <configuration>
>> >    <property>
>> >         <name>mapred.job.tracker</name>
>> >         <value>vm153:9001</value>
>> >    </property>
>> >    <property>
>> >         <name>mapred.map.tasks</name>
>> >         <value>20</value>
>> >    </property>
>> >    <property>
>> >         <name>mapred.reduce.tasks</name>
>> >         <value>5</value>
>> >    </property>
>> > </configuration>
>> >
>> > WHAT'S THE PROBLEM ?Do I need to configure other parameters, there're 
>> > parameters like dfs.secondary.http.address,dfs.datanode.address, the ip of 
>> > which is 0.0.0.0,do I need to change them ?
>> >
>> > 2010-08-18
>> >
>> >
>> >
>> > shangan
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus signature 
>> database 5345 (20100805) __________
>> The message was checked by ESET NOD32 Antivirus.
>> http://www.eset.com
>
> __________ Information from ESET NOD32 Antivirus, version of virus signature 
> database 5345 (20100805) __________
> The message was checked by ESET NOD32 Antivirus.
> http://www.eset.com
>




-- 
Harsh J
www.harshj.com

Re: RE: mapreduce doesn't work in my cluster

Reply via email to