S3/EC2 setup problem: port 9001 unreachable

2008-03-10 Thread Andreas Kostyrka
Hi!

I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
using the Hadoop AMIs)

I've got the S3 based HDFS working, but I'm stumped when I try to get a
test job running:

[EMAIL PROTECTED]:~/hadoop-0.16.0$ time bin/hadoop jar 
contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh -reducer cat 
-input testlogs/* -output testlogs-output
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] 
/tmp/streamjob17970.jar tmpDir=null
08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to process : 
152
08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): 
[/tmp/hadoop-hadoop/mapred/local]
08/03/10 14:02:58 INFO streaming.StreamJob: Running job: job_200803101400_0001
08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
08/03/10 14:02:58 INFO streaming.StreamJob: 
/home/hadoop/hadoop-0.16.0/bin/../bin/hadoop job  
-Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 -kill 
job_200803101400_0001
08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: 
http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%

Furthermore, when I try to connect port 9001 on 10.251.75.165 via telnet from 
the masterhost itself, it connects:
[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
Trying 10.251.75.165...
Connected to 10.251.75.165.
Escape character is '^]'.
^]
telnet quit
Connection closed.

When I try to do this from other VMs in my cluster, it just hangs. 
(tcpdump on the masterhost shows no activity for tcp port 9001):

[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 9001
Trying 10.251.75.165...

[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 22
Trying 10.251.75.165...
Connected to ip-10-251-75-165.ec2.internal.
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3p2 Debian-9
^]
telnet quit
Connection closed.

This is also shown when I connect port 50030, which shows 0 nodes ready to 
process the job.

Furthermore, the slaves show the following messages:
2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem connecting to 
server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001
2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already 
tried 1 time(s).
2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already 
tried 2 time(s).

Last but not least, here is my site conf:
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
configuration

property
  namefs.default.name/name
  values3://lookhad/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem./description
/property

property
  namefs.s3.awsAccessKeyId/name
  value2DFGTTFSDFDSZU5SDSD7S5202/value
/property

property
  namefs.s3.awsSecretAccessKey/name
  valueRUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG/value
/property

property
  namemapred.job.tracker/name
  valueec2-67-202-58-97.compute-1.amazonaws.com:9001/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property
/configuration

The masternode listens not no localhost:
[EMAIL PROTECTED]:~/hadoop-0.16.0$ netstat -an | grep 9001
tcp0  0 10.251.75.165:9001  0.0.0.0:*   LISTEN 

Any ideas? My conclusions thus are:

1.) First, it's not a general connectivity problem, because I can connect port 
22 without any problems.
2.) OTOH, on port 9001, inside the same group, the connectivity seems to be 
limited.
3.) All AWS docs tell me that VMs in one group have no firewalls in place.

So what is happening here? Any ideas?

Andreas


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: S3/EC2 setup problem: port 9001 unreachable

2008-03-10 Thread Andreas Kostyrka
Found it, was security group setup problem ;(

Andreas

Am Montag, den 10.03.2008, 16:49 +0100 schrieb Andreas Kostyrka:
 Hi!
 
 I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
 using the Hadoop AMIs)
 
 I've got the S3 based HDFS working, but I'm stumped when I try to get a
 test job running:
 
 [EMAIL PROTECTED]:~/hadoop-0.16.0$ time bin/hadoop jar 
 contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh -reducer 
 cat -input testlogs/* -output testlogs-output
 additionalConfSpec_:null
 null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
 packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] 
 /tmp/streamjob17970.jar tmpDir=null
 08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to process : 
 152
 08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): 
 [/tmp/hadoop-hadoop/mapred/local]
 08/03/10 14:02:58 INFO streaming.StreamJob: Running job: job_200803101400_0001
 08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
 08/03/10 14:02:58 INFO streaming.StreamJob: 
 /home/hadoop/hadoop-0.16.0/bin/../bin/hadoop job  
 -Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 -kill 
 job_200803101400_0001
 08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: 
 http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
 08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%
 
 Furthermore, when I try to connect port 9001 on 10.251.75.165 via telnet from 
 the masterhost itself, it connects:
 [EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
 Trying 10.251.75.165...
 Connected to 10.251.75.165.
 Escape character is '^]'.
 ^]
 telnet quit
 Connection closed.
 
 When I try to do this from other VMs in my cluster, it just hangs. 
 (tcpdump on the masterhost shows no activity for tcp port 9001):
 
 [EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 9001
 Trying 10.251.75.165...
 
 [EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 22
 Trying 10.251.75.165...
 Connected to ip-10-251-75-165.ec2.internal.
 Escape character is '^]'.
 SSH-2.0-OpenSSH_4.3p2 Debian-9
 ^]
 telnet quit
 Connection closed.
 
 This is also shown when I connect port 50030, which shows 0 nodes ready to 
 process the job.
 
 Furthermore, the slaves show the following messages:
 2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem connecting to 
 server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001
 2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. 
 Already tried 1 time(s).
 2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. 
 Already tried 2 time(s).
 
 Last but not least, here is my site conf:
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 configuration
 
 property
   namefs.default.name/name
   values3://lookhad/value
   descriptionThe name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem./description
 /property
 
 property
   namefs.s3.awsAccessKeyId/name
   value2DFGTTFSDFDSZU5SDSD7S5202/value
 /property
 
 property
   namefs.s3.awsSecretAccessKey/name
   valueRUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG/value
 /property
 
 property
   namemapred.job.tracker/name
   valueec2-67-202-58-97.compute-1.amazonaws.com:9001/value
   descriptionThe host and port that the MapReduce job tracker runs
   at.  If local, then jobs are run in-process as a single map
   and reduce task.
   /description
 /property
 /configuration
 
 The masternode listens not no localhost:
 [EMAIL PROTECTED]:~/hadoop-0.16.0$ netstat -an | grep 9001
 tcp0  0 10.251.75.165:9001  0.0.0.0:*   LISTEN
  
 
 Any ideas? My conclusions thus are:
 
 1.) First, it's not a general connectivity problem, because I can connect 
 port 22 without any problems.
 2.) OTOH, on port 9001, inside the same group, the connectivity seems to be 
 limited.
 3.) All AWS docs tell me that VMs in one group have no firewalls in place.
 
 So what is happening here? Any ideas?
 
 Andreas


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: S3/EC2 setup problem: port 9001 unreachable

2008-03-10 Thread Chris K Wensel

Andreas

Here are some moderately useful notes on using EC2/S3, mostly learned  
leveraging Hadoop. The groups can't see themselves issue is listed  
grin.


http://www.manamplified.org/archives/2008/03/notes-on-using-ec2-s3.html

enjoy
ckw

On Mar 10, 2008, at 9:51 AM, Andreas Kostyrka wrote:


Found it, was security group setup problem ;(

Andreas

Am Montag, den 10.03.2008, 16:49 +0100 schrieb Andreas Kostyrka:

Hi!

I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
using the Hadoop AMIs)

I've got the S3 based HDFS working, but I'm stumped when I try to  
get a

test job running:

[EMAIL PROTECTED]:~/hadoop-0.16.0$ time bin/hadoop jar  
contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh - 
reducer cat -input testlogs/* -output testlogs-output

additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] /tmp/ 
streamjob17970.jar tmpDir=null
08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to  
process : 152
08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): [/tmp/ 
hadoop-hadoop/mapred/local]
08/03/10 14:02:58 INFO streaming.StreamJob: Running job:  
job_200803101400_0001

08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
08/03/10 14:02:58 INFO streaming.StreamJob: /home/hadoop/ 
hadoop-0.16.0/bin/../bin/hadoop job  - 
Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 - 
kill job_200803101400_0001

08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: 
http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%

Furthermore, when I try to connect port 9001 on 10.251.75.165 via  
telnet from the masterhost itself, it connects:

[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
Trying 10.251.75.165...
Connected to 10.251.75.165.
Escape character is '^]'.
^]
telnet quit
Connection closed.

When I try to do this from other VMs in my cluster, it just hangs.
(tcpdump on the masterhost shows no activity for tcp port 9001):

[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet  
ip-10-251-75-165.ec2.internal 9001

Trying 10.251.75.165...

[EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet  
ip-10-251-75-165.ec2.internal 22

Trying 10.251.75.165...
Connected to ip-10-251-75-165.ec2.internal.
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3p2 Debian-9
^]
telnet quit
Connection closed.

This is also shown when I connect port 50030, which shows 0 nodes  
ready to process the job.


Furthermore, the slaves show the following messages:
2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem  
connecting to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
10.251.75.165:9001
2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
10.251.75.165:9001. Already tried 1 time(s).
2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying  
connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
10.251.75.165:9001. Already tried 2 time(s).


Last but not least, here is my site conf:
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
configuration

property
 namefs.default.name/name
 values3://lookhad/value
 descriptionThe name of the default file system.  A URI whose
 scheme and authority determine the FileSystem implementation.  The
 uri's scheme determines the config property (fs.SCHEME.impl) naming
 the FileSystem implementation class.  The uri's authority is used to
 determine the host, port, etc. for a filesystem./description
/property

property
 namefs.s3.awsAccessKeyId/name
 value2DFGTTFSDFDSZU5SDSD7S5202/value
/property

property
 namefs.s3.awsSecretAccessKey/name
 valueRUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG/value
/property

property
 namemapred.job.tracker/name
 valueec2-67-202-58-97.compute-1.amazonaws.com:9001/value
 descriptionThe host and port that the MapReduce job tracker runs
 at.  If local, then jobs are run in-process as a single map
 and reduce task.
 /description
/property
/configuration

The masternode listens not no localhost:
[EMAIL PROTECTED]:~/hadoop-0.16.0$ netstat -an | grep 9001
tcp0  0 10.251.75.165:9001  0.0.0.0:*
LISTEN


Any ideas? My conclusions thus are:

1.) First, it's not a general connectivity problem, because I can  
connect port 22 without any problems.
2.) OTOH, on port 9001, inside the same group, the connectivity  
seems to be limited.
3.) All AWS docs tell me that VMs in one group have no firewalls in  
place.


So what is happening here? Any ideas?

Andreas


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/