Hi Todd, Thanks for the suggestions. I checked netstat -a on the master, and it doesn't seem to indicate that port 50002 is in use by anybody: -------------------------------- r...@domu-12-31-39-04-30-16 (/vol/hadoop-0.20.0/)> netstat -a |more Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 *:8649 *:* LISTEN tcp 0 0 *:8651 *:* LISTEN tcp 0 0 *:8652 *:* LISTEN tcp 0 0 localhost.localdomain:smtp *:* LISTEN tcp 0 0 localhost.localdomain:8649 localhost.localdomain:58683 TIME_WAIT tcp 0 0 localhost.localdomain:8649 localhost.localdomain:58685 TIME_WAIT tcp 0 0 localhost.localdomain:8649 localhost.localdomain:58684 TIME_WAIT tcp 0 0 localhost.localdomain:8649 localhost.localdomain:58686 TIME_WAIT tcp 0 0 *:http *:* LISTEN tcp 0 0 domU-12-31-39-04-30-1:50001 *:* LISTEN tcp 0 0 *:50070 *:* LISTEN tcp 0 0 *:ssh *:* LISTEN tcp 0 0 *:42207 *:* LISTEN tcp 0 48 domU-12-31-39-04-30-16.:ssh key1.docomolabs-usa.c:19829 ESTABLISHED tcp 0 0 domU-12-31-39-04-30-1:50001 domU-12-31-39-04-1E-6:56434 ESTABLISHED tcp 0 0 domU-12-31-39-04-30-1:50001 domU-12-31-39-04-30-C:51812 ESTABLISHED udp 0 0 *:bootpc *:*
udp 0 0 *:8649 *:* udp 0 0 *:filenet-rpc *:* udp 0 0 *:filenet-nch *:* Active UNIX domain sockets (servers and established) ... -------------------------------- Also, below is the hadoop-site.xml on the master [auto-generated by the contrib/ec2 scripts and downloaded to the master and slaves]: -------------------------------- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/mnt/hadoop</value> </property> <property> <name>fs.default.name</name> <value>hdfs://domU-12-31-39-04-30-16.compute-1.internal:50001</value> </property> <property> <name>mapred.job.tracker</name> <value>hdfs://domU-12-31-39-04-30-16.compute-1.internal:50002</value> </property> <property> <name>tasktracker.http.threads</name> <value>80</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>3</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>3</value> </property> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>dfs.client.block.write.retries</name> <value>3</value> </property> </configuration> -------------------------------- which says that fs.default.name [name node] and mapred.job.tracker [job tracker] are assigned to different ports [50001 and 50002 respectively]. Would appreciate any thoughts... Thanks, Jeyendran -----Original Message----- From: Todd Lipcon [mailto:[email protected]] Sent: Monday, July 20, 2009 2:31 PM To: [email protected] Subject: Re: Unable to start Hadoop mapred cluster on EC2 with Hadoop 0.20.0 Hi Jeyendran, Is it possible that you've configured the jobtracker's RPC address (mapred.job.tracker) to be the same as its HTTP address? The "Address already in use" error indicates that someone is already claiming port 50002. That might be another daemon on the same machine, or it could be another port in use by the JT. -Todd On Mon, Jul 20, 2009 at 2:20 PM, Jeyendran Balakrishnan < [email protected]> wrote: > Hello, > > I downloaded Hadoop 0.20.0 and used the src/contrib/ec2/bin scripts to > launch a Hadoop cluster on Amazon EC2. To do so, I modified the bundled > scripts above for my EC2 account, and then created my own Hadoop 0.20.0 > AMI. The steps I followed for creating AMIs and launching EC2 Hadoop > clusters are the same I was using for over a year with Hadoop 0.18.* and > 0.19.*. > > I launched an instance with my new Hadoop 0.20.0 AMI, then logged in and > ran the following to launch a new cluster: > root(/vol/hadoop-0.20.0)> bin/launch-hadoop-cluster hadoop-test 2 > > After the usual EC2 wait, one master and two slave instances were > launched on EC2, as expected. When I ssh'ed into the instances, here is > what I found: > > Slaves: DataNode and NameNode are running > Master: Only NameNode is running > > I could use HDFS commands (using $HADOOP_HOME/bin/hadoop scripts) > without any problems, from both master and slaves. However, since > JobTracker is not running, I cannot run map-reduce jobs. > > I checked the logs from /vol/hadoop-0.20.0/logs for the JobTracker, > reproduced below: > ----------------------------------------------- > <<< > 2009-07-20 16:56:30,273 WARN org.apache.hadoop.conf.Configuration: > DEPRECATED: hadoop-site.xml found in the classpath. Usage of > hadoop-site.xml is deprecated. Instead use core-site.xml, > mapred-site.xml and h > dfs-site.xml to override properties of core-default.xml, > mapred-default.xml and hdfs-default.xml respectively > 2009-07-20 16:56:30,320 INFO org.apache.hadoop.mapred.JobTracker: > STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting JobTracker > STARTUP_MSG: host = domU-12-31-39-04-30-16/10.240.55.228 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.0 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r > 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009 > ************************************************************/ > 2009-07-20 16:56:31,332 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=JobTracker, port=50002 > 2009-07-20 16:56:31,603 INFO org.mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2009-07-20 16:56:31,900 INFO org.apache.hadoop.http.HttpServer: Jetty > bound to port 50030 > 2009-07-20 16:56:31,900 INFO org.mortbay.log: jetty-6.1.14 > 2009-07-20 16:56:33,461 INFO org.mortbay.log: Started > [email protected]:50030 > 2009-07-20 16:56:33,462 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=JobTracker, sessionId= > 2009-07-20 16:56:33,531 INFO org.apache.hadoop.mapred.JobTracker: > JobTracker up at: 50002 > 2009-07-20 16:56:33,532 INFO org.apache.hadoop.mapred.JobTracker: > JobTracker webserver: 50030 > 2009-07-20 16:56:51,554 INFO org.apache.hadoop.mapred.JobTracker: > Cleaning up the system directory > 2009-07-20 16:56:53,060 INFO org.apache.hadoop.hdfs.DFSClient: > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /mnt/hadoop/mapred/system/jobtracker.info could only be replicated to 0 > nodes, instead of 1 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F > SNamesystem.java:1256) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:4 > 22) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav > a:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor > Impl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > at org.apache.hadoop.ipc.Client.call(Client.java:739) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy4.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav > a:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor > Impl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo > cationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocation > Handler.java:59) > at $Proxy4.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DF > SClient.java:2873) > ... > ... > 2009-07-20 16:56:55,878 WARN org.apache.hadoop.hdfs.DFSClient: > NotReplicatedYetException sleeping > /mnt/hadoop/mapred/system/jobtracker.info retries left 1 > 2009-07-20 16:56:59,082 WARN org.apache.hadoop.hdfs.DFSClient: > DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: > java.io.IOException: File /mnt/hadoop/mapred/system/jobtracker.info > could only > replicated to 0 nodes, instead of 1 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F > SNamesystem.java:1256) > ... > ... > > 2009-07-20 16:57:00,092 FATAL org.apache.hadoop.mapred.JobTracker: > java.net.BindException: Problem binding to > domU-12-31-39-04-30-16.compute-1.internal/10.240.55.228:50002 : Address > already in use > at org.apache.hadoop.ipc.Server.bind(Server.java:190) > at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:253) > at org.apache.hadoop.ipc.Server.<init>(Server.java:1026) > at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:488) > at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450) > at > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1537) > at > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:174) > at > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3528) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119 > ) > at > sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at org.apache.hadoop.ipc.Server.bind(Server.java:188) > ... 7 more > > > 2009-07-20 16:57:00,093 INFO org.apache.hadoop.mapred.JobTracker: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down JobTracker at > domU-12-31-39-04-30-16/10.240.55.228 > ************************************************************/ > >>> > ----------------------------------------------- > > So it looks like the JobTracker launched, but then died trying to > replicate the jobtracker.info file to one or more slaves. > > Would appreciate any help in this... > > Thanks a lot, > jp > >
