Hi! I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not using the Hadoop AMIs)
I've got the S3 based HDFS working, but I'm stumped when I try to get a test job running: [EMAIL PROTECTED]:~/hadoop-0.16.0$ time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh -reducer cat -input testlogs/* -output testlogs-output additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] /tmp/streamjob17970.jar tmpDir=null 08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to process : 152 08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local] 08/03/10 14:02:58 INFO streaming.StreamJob: Running job: job_200803101400_0001 08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run: 08/03/10 14:02:58 INFO streaming.StreamJob: /home/hadoop/hadoop-0.16.0/bin/../bin/hadoop job -Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 -kill job_200803101400_0001 08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001 08/03/10 14:02:59 INFO streaming.StreamJob: map 0% reduce 0% Furthermore, when I try to connect port 9001 on 10.251.75.165 via telnet from the masterhost itself, it connects: [EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet 10.251.75.165 9001 Trying 10.251.75.165... Connected to 10.251.75.165. Escape character is '^]'. ^] telnet> quit Connection closed. When I try to do this from other VMs in my cluster, it just hangs. (tcpdump on the masterhost shows no activity for tcp port 9001): [EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 9001 Trying 10.251.75.165... [EMAIL PROTECTED]:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 22 Trying 10.251.75.165... Connected to ip-10-251-75-165.ec2.internal. Escape character is '^]'. SSH-2.0-OpenSSH_4.3p2 Debian-9 ^] telnet> quit Connection closed. This is also shown when I connect port 50030, which shows 0 nodes ready to process the job. Furthermore, the slaves show the following messages: 2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem connecting to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001 2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already tried 1 time(s). 2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already tried 2 time(s). Last but not least, here is my site conf: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>s3://lookhad</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> <property> <name>fs.s3.awsAccessKeyId</name> <value>2DFGTTFSDFDSZU5SDSD7S5202</value> </property> <property> <name>fs.s3.awsSecretAccessKey</name> <value>RUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG</value> </property> <property> <name>mapred.job.tracker</name> <value>ec2-67-202-58-97.compute-1.amazonaws.com:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration> The masternode listens not no localhost: [EMAIL PROTECTED]:~/hadoop-0.16.0$ netstat -an | grep 9001 tcp 0 0 10.251.75.165:9001 0.0.0.0:* LISTEN Any ideas? My conclusions thus are: 1.) First, it's not a general connectivity problem, because I can connect port 22 without any problems. 2.) OTOH, on port 9001, inside the same group, the connectivity seems to be limited. 3.) All AWS docs tell me that VMs in one group have no firewalls in place. So what is happening here? Any ideas? Andreas
signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil