S3/EC2 setup problem: port 9001 unreachable

Andreas Kostyrka Mon, 10 Mar 2008 08:49:40 -0700

Hi!

I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
using the Hadoop AMIs)


I've got the S3 based HDFS working, but I'm stumped when I try to get a
test job running:

[EMAIL PROTECTED]:~/hadoop-0.16.0$ time bin/hadoop jar 
contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh -reducer cat 
-input testlogs/* -output testlogs-output
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] 
/tmp/streamjob17970.jar tmpDir=null
08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to process : 
152
08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): 
[/tmp/hadoop-hadoop/mapred/local]
08/03/10 14:02:58 INFO streaming.StreamJob: Running job: job_200803101400_0001
08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
08/03/10 14:02:58 INFO streaming.StreamJob: 
/home/hadoop/hadoop-0.16.0/bin/../bin/hadoop job  
-Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 -kill 
job_200803101400_0001
08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: 
http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%

Furthermore, when I try to connect port 9001 on 10.251.75.165 via telnet from 
the masterhost itself, it connects:
[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
Trying 10.251.75.165...
Connected to 10.251.75.165.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

When I try to do this from other VMs in my cluster, it just hangs. 
(tcpdump on the masterhost shows no activity for tcp port 9001):

[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 9001
Trying 10.251.75.165...

[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 22
Trying 10.251.75.165...
Connected to ip-10-251-75-165.ec2.internal.
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3p2 Debian-9
^]
telnet> quit
Connection closed.

This is also shown when I connect port 50030, which shows 0 nodes ready to 
process the job.

Furthermore, the slaves show the following messages:
2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem connecting to 
server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001
2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already 
tried 1 time(s).
2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already 
tried 2 time(s).

Last but not least, here is my site conf:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
  <name>fs.default.name</name>
  <value>s3://lookhad</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>2DFGTTFSDFDSZU5SDSD7S5202</value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>RUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG</value>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>ec2-67-202-58-97.compute-1.amazonaws.com:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
</configuration>

The masternode listens not no localhost:
[EMAIL PROTECTED]:~/hadoop-0.16.0$ netstat -an | grep 9001
tcp        0      0 10.251.75.165:9001      0.0.0.0:*               LISTEN     

Any ideas? My conclusions thus are:

1.) First, it's not a general connectivity problem, because I can connect port 
22 without any problems.
2.) OTOH, on port 9001, inside the same group, the connectivity seems to be 
limited.
3.) All AWS docs tell me that VMs in one group have no firewalls in place.

So what is happening here? Any ideas?

Andreas

signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil

S3/EC2 setup problem: port 9001 unreachable

Reply via email to