RE: Unable to start Hadoop mapred cluster on EC2 with Hadoop 0.20.0

Jeyendran Balakrishnan Mon, 20 Jul 2009 15:20:34 -0700

Hi Todd,

Thanks for the suggestions. I checked netstat -a on the master, and it
doesn't seem to indicate that port 50002 is in use by anybody:
--------------------------------
r...@domu-12-31-39-04-30-16 (/vol/hadoop-0.20.0/)> netstat -a |more
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address
State      
tcp        0      0 *:8649                      *:*
LISTEN      
tcp        0      0 *:8651                      *:*
LISTEN      
tcp        0      0 *:8652                      *:*
LISTEN      
tcp        0      0 localhost.localdomain:smtp  *:*
LISTEN      
tcp        0      0 localhost.localdomain:8649
localhost.localdomain:58683 TIME_WAIT   
tcp        0      0 localhost.localdomain:8649
localhost.localdomain:58685 TIME_WAIT   
tcp        0      0 localhost.localdomain:8649
localhost.localdomain:58684 TIME_WAIT   
tcp        0      0 localhost.localdomain:8649
localhost.localdomain:58686 TIME_WAIT   
tcp        0      0 *:http                      *:*
LISTEN      
tcp        0      0 domU-12-31-39-04-30-1:50001 *:*
LISTEN      
tcp        0      0 *:50070                     *:*
LISTEN      
tcp        0      0 *:ssh                       *:*
LISTEN      
tcp        0      0 *:42207                     *:*
LISTEN      
tcp        0     48 domU-12-31-39-04-30-16.:ssh
key1.docomolabs-usa.c:19829 ESTABLISHED 
tcp        0      0 domU-12-31-39-04-30-1:50001
domU-12-31-39-04-1E-6:56434 ESTABLISHED 
tcp        0      0 domU-12-31-39-04-30-1:50001
domU-12-31-39-04-30-C:51812 ESTABLISHED 
udp        0      0 *:bootpc                    *:*


udp        0      0 *:8649                      *:*

udp        0      0 *:filenet-rpc               *:*

udp        0      0 *:filenet-nch               *:*             
Active UNIX domain sockets (servers and established)
...
--------------------------------

Also, below is the hadoop-site.xml on the master [auto-generated by the
contrib/ec2 scripts and downloaded to the master and slaves]:

--------------------------------
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/hadoop</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://domU-12-31-39-04-30-16.compute-1.internal:50001</value>
</property>
<property>
  <name>mapred.job.tracker</name>
  <value>hdfs://domU-12-31-39-04-30-16.compute-1.internal:50002</value>
</property>
<property>
  <name>tasktracker.http.threads</name>
  <value>80</value>
</property>
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>3</value>
</property>
<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>3</value>
</property>
<property>
  <name>mapred.output.compress</name>
  <value>true</value>
</property>
<property>
  <name>mapred.output.compression.type</name>
  <value>BLOCK</value>
</property>
<property>

  <name>dfs.client.block.write.retries</name>
  <value>3</value>
</property>
</configuration>
--------------------------------

which says that fs.default.name [name node] and mapred.job.tracker [job
tracker] are assigned to different ports [50001 and 50002 respectively].

Would appreciate any thoughts...

Thanks,
Jeyendran

-----Original Message-----
From: Todd Lipcon [mailto:[email protected]] 
Sent: Monday, July 20, 2009 2:31 PM
To: [email protected]
Subject: Re: Unable to start Hadoop mapred cluster on EC2 with Hadoop
0.20.0

Hi Jeyendran,

Is it possible that you've configured the jobtracker's RPC address
(mapred.job.tracker) to be the same as its HTTP address? The "Address
already in use" error indicates that someone is already claiming port
50002.
That might be another daemon on the same machine, or it could be another
port in use by the JT.

-Todd

On Mon, Jul 20, 2009 at 2:20 PM, Jeyendran Balakrishnan <
[email protected]> wrote:

> Hello,
>
> I downloaded Hadoop 0.20.0 and used the src/contrib/ec2/bin scripts to
> launch a Hadoop cluster on Amazon EC2. To do so, I modified the
bundled
> scripts above for my EC2 account, and then created my own Hadoop
0.20.0
> AMI. The steps I followed for creating AMIs and launching EC2 Hadoop
> clusters are the same I was using for over a year with Hadoop 0.18.*
and
> 0.19.*.
>
> I launched an instance with my new Hadoop 0.20.0 AMI, then logged in
and
> ran the following to launch a new cluster:
> root(/vol/hadoop-0.20.0)> bin/launch-hadoop-cluster hadoop-test 2
>
> After the usual EC2 wait, one master and two slave instances were
> launched on EC2, as expected. When I ssh'ed into the instances, here
is
> what I found:
>
> Slaves: DataNode and NameNode are running
> Master: Only NameNode is running
>
> I could use HDFS commands (using $HADOOP_HOME/bin/hadoop scripts)
> without any problems, from both master and slaves. However, since
> JobTracker is not running, I cannot run map-reduce jobs.
>
> I checked the logs from /vol/hadoop-0.20.0/logs for the JobTracker,
> reproduced below:
> -----------------------------------------------
> <<<
> 2009-07-20 16:56:30,273 WARN org.apache.hadoop.conf.Configuration:
> DEPRECATED: hadoop-site.xml found in the classpath. Usage of
> hadoop-site.xml is deprecated. Instead use core-site.xml,
> mapred-site.xml and h
> dfs-site.xml to override properties of core-default.xml,
> mapred-default.xml and hdfs-default.xml respectively
> 2009-07-20 16:56:30,320 INFO org.apache.hadoop.mapred.JobTracker:
> STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting JobTracker
> STARTUP_MSG:   host = domU-12-31-39-04-30-16/10.240.55.228
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.0
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r
> 763504; compiled by 'ndaley' on Thu Apr  9 05:18:40 UTC 2009
> ************************************************************/
> 2009-07-20 16:56:31,332 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=JobTracker, port=50002
> 2009-07-20 16:56:31,603 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2009-07-20 16:56:31,900 INFO org.apache.hadoop.http.HttpServer: Jetty
> bound to port 50030
> 2009-07-20 16:56:31,900 INFO org.mortbay.log: jetty-6.1.14
> 2009-07-20 16:56:33,461 INFO org.mortbay.log: Started
> [email protected]:50030
> 2009-07-20 16:56:33,462 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2009-07-20 16:56:33,531 INFO org.apache.hadoop.mapred.JobTracker:
> JobTracker up at: 50002
> 2009-07-20 16:56:33,532 INFO org.apache.hadoop.mapred.JobTracker:
> JobTracker webserver: 50030
> 2009-07-20 16:56:51,554 INFO org.apache.hadoop.mapred.JobTracker:
> Cleaning up the system directory
> 2009-07-20 16:56:53,060 INFO org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /mnt/hadoop/mapred/system/jobtracker.info could only be replicated to
0
> nodes, instead of 1
>        at
>
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F
> SNamesystem.java:1256)
>        at
>
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:4
> 22)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
> a:39)
>        at
>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:739)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy4.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
> a:39)
>        at
>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
>
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo
> cationHandler.java:82)
>        at
>
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocation
> Handler.java:59)
>        at $Proxy4.addBlock(Unknown Source)
>        at
>
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DF
> SClient.java:2873)
> ...
> ...
> 2009-07-20 16:56:55,878 WARN org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> /mnt/hadoop/mapred/system/jobtracker.info retries left 1
> 2009-07-20 16:56:59,082 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: File /mnt/hadoop/mapred/system/jobtracker.info
> could only
>  replicated to 0 nodes, instead of 1
>        at
>
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F
> SNamesystem.java:1256)
> ...
> ...
>
> 2009-07-20 16:57:00,092 FATAL org.apache.hadoop.mapred.JobTracker:
> java.net.BindException: Problem binding to
> domU-12-31-39-04-30-16.compute-1.internal/10.240.55.228:50002 :
Address
> already in use
>        at org.apache.hadoop.ipc.Server.bind(Server.java:190)
>        at
org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:253)
>        at org.apache.hadoop.ipc.Server.<init>(Server.java:1026)
>        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:488)
>        at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)
>        at
> org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1537)
>        at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:174)
>        at
> org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3528)
> Caused by: java.net.BindException: Address already in use
>        at sun.nio.ch.Net.bind(Native Method)
>        at
>
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119
> )
>        at
> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>        at org.apache.hadoop.ipc.Server.bind(Server.java:188)
>        ... 7 more
>
>
> 2009-07-20 16:57:00,093 INFO org.apache.hadoop.mapred.JobTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down JobTracker at
> domU-12-31-39-04-30-16/10.240.55.228
> ************************************************************/
> >>>
> -----------------------------------------------
>
> So it looks like the JobTracker launched, but then died trying to
> replicate the jobtracker.info file to one or more slaves.
>
> Would appreciate any help in this...
>
> Thanks a lot,
> jp
>
>

RE: Unable to start Hadoop mapred cluster on EC2 with Hadoop 0.20.0

Reply via email to