Hi CJ, One of my CentOS systems was built with the full installation. This installed a base iptables ruleset (sorry, I didn't keep the output from that). All my other systems looked like this:
[r...@msddl01 ~]# iptables --list Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination I cleared the iptables for my problem system with the following: /sbin/iptables -P INPUT ACCEPT /sbin/iptables -P OUTPUT ACCEPT /sbin/iptables -P FORWARD ACCEPT /sbin/iptables -Z /sbin/iptables -F Once that was done, my node responded to the cluster, accepted jobs, etc. One thing I'm not clear on from your description below: Do your HDFS and map/reduce management web interfaces come up once you start dfs and mapred? My habit is: - hadoop namenode -format (1st time) - start-dfs.sh - start-mapred.sh - Check HDFS web interface -- status of HDFS nodes. - Check map/reduce web interface -- status of task trackers. - Make sure that both HDFS and map/reduce interfaces show as up and running before kicking off jobs. HDFS will go into standby mode, and mapreduce will show initializing at times. Hope it helps. Best regards, Danny -----Original Message----- From: C J [mailto:[email protected]] Sent: Tuesday, June 30, 2009 12:25 PM To: [email protected] Subject: Re: problem with running WordCode v0 with a distributed operation Hi Danny, can you be a bit more precise what was wrong in your iptable? thank you On Tue, Jun 30, 2009 at 5:30 PM, Gross, Danny <[email protected]>wrote: > Hello, > > I suggest that you check iptables on your systems. At one time, one of > my nodes showed a similar error, and this was the culprit. > > Good luck, > > Danny > > -----Original Message----- > From: C J [mailto:[email protected]] > Sent: Tuesday, June 30, 2009 2:15 AM > To: [email protected]; C J > Subject: problem with running WordCode v0 with a distributed operation > > Help help help, > > I am a new user, it has been already 2-3 weeks that I am trying to run > Hadoop 0.18.3. > > > *My settings:* > > 1. > > I am on LINUX > 2. > > Using the Java 1.6.0 > 3. > > I have unpacked the hadoop-0.18.3 folder directly on the desktop > > > *Working steps:* > > 1. > > I have succeeded to run it on the local > 2. > > I went through the quick start tutorial and managed to operate > > Hadoop in the standalone and the Pseudo-distributed motes > 3. > > I started going through the map/reduce tutorial and managed to > > run the WordCount v1.0 with a standalone operation. > > *Current status:* > > 1. > > I would like to run the WordCount v1.0 example on a distributed > operation > 2. > > I went through the steps in the cluster setup tutorial and did the > following: > > > - > > On a distant server I am running 4 virtual machines: > > 134.130.223.58:1 > > 134.130.223.72:1 > > 134.130.223.85:1 > > 134.130.223.92:1 > - > > My own machine has the following ip address > > 134.130.222.54 > - > > The hadoop-en.sh file is the same on all the five machines: > > *export HADOOP_HEAPSIZE=500* > > *export JAVA_HOME="/usr/java/jdk1.6.0_14"* > > *export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote > $HADOOP_NAMENODE_OPTS"* > > *export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote > $HADOOP_SECONDARYNAMENODE_OPTS"* > > *export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote * > > *$HADOOP_DATANODE_OPTS"* > > *export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote * > > *$HADOOP_BALANCER_OPTS"* > > *export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote * > > *$HADOOP_JOBTRACKER_OPTS"* > > *export HADOOP_HOME="/root/Desktop/hadoop-0.18.3"* > > *export HADOOP_VERSION="0.18.3"* > > *export HADOOP_LOG_DIR=${HADOOP_HOME}/logs* > - > > The hadoop-site.xml is the same on all the five machines: > > *<?xml version="1.0"?>* > > *<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>* > > *<!-- Put site-specific property overrides in this file. -->* > > *<configuration>* > > * <property>* > > * Hadoop Quickstart* > > * Page 3* > > * Copyright (c) 2007 The Apache Software Foundation. All rights > reserved.* > > * <name>fs.default.name</name>* > > * <value>134.130.223.85:9000</value>* > > * </property>* > > * <property>* > > * <name>mapred.job.tracker</name>* > > * <value>134.130.223.58:1</value>* > > * </property>* > > * <property>* > > * <name>dfs.name.dir</name>* > > * <value>/root/Desktop/hadoop-0.18.3/logstina</value>* > > * </property>* > > * <property>* > > * <name>dfs.data.dir</name>* > > * <value>/root/Desktop/hadoop-0.18.3/blockstina</value>* > > * </property>* > > * <property>* > > * <name>mapred.system.dir</name>* > > * <value>systemtina</value>* > > * </property>* > > * <property>* > > * <name>mapred.local.dir</name>* > > * <value>/root/Desktop/hadoop-0.18.3/tempMapReducetina</value>* > > * </property>* > > *</configuration> * > > > > > > - > > The slaves file is the same on all five machines and contains: > > 134.130.223.72 > > 134.130.222.54 > > 134.130.223.92 > - > > from the Namenode machine 134.130.223.85 I have formatted a new > namenode, > started the hdfs (bin/start-dfs.sh) and started the Map-reduce > (bin/start-mapred.sh) > > > *Problem:* > > Si Since the jar file of the WordCount was already created (by the > local operation) in the folder > > us usr/tina, I tried running directly the application similarly to > local > operation by typing > > *$bin/hadoop jar usr/tina/wordcount.jar org.myorg.WordCount > usr/tina/wordcount/input usr/tina/wordcount/output.* > > Then I got the following error: > > > > *09/06/29 19:38:05 WARN fs.FileSystem: "134.130.223.85:9000" is a > deprecated filesystem name. Use "hdfs://134.130.223.85:9000/" instead.* > > *09/06/29 19:38:06 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 0 time(s).* > > *09/06/29 19:38:07 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 1 time(s).* > > *09/06/29 19:38:08 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 2 time(s).* > > *09/06/29 19:38:09 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 3 time(s).* > > *09/06/29 19:38:10 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 4 time(s).* > > *09/06/29 19:38:11 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 5 time(s).* > > *09/06/29 19:38:12 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 6 time(s).* > > *09/06/29 19:38:13 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 7 time(s).* > > *09/06/29 19:38:14 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 8 time(s).* > > *09/06/29 19:38:15 INFO ipc.Client: Retrying connect to server: / > 134.130.223.85:9000. Already tried 9 time(s).* > > *java.lang.RuntimeException: java.net.ConnectException: Call to / > 134.130.223.85:9000 failed on connection exception: > java.net.ConnectException: Connection refused* > > *at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:358)* > > *at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.j > ava:377) > * > > *at org.myorg.WordCount.main(WordCount.java:53)* > > *at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* > > *at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav > a:39) > * > > *at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor > Impl.java:25) > * > > *at java.lang.reflect.Method.invoke(Method.java:597)* > > *at org.apache.hadoop.util.RunJar.main(RunJar.java:155)* > > *at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)* > > *at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* > > *at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)* > > *at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)* > > *Caused by: java.net.ConnectException: Call to /134.130.223.85:9000 > failed > on connection exception: java.net.ConnectException: Connection refused* > > *at org.apache.hadoop.ipc.Client.wrapException(Client.java:743)* > > *at org.apache.hadoop.ipc.Client.call(Client.java:719)* > > *at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)* > > *at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)* > > *at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)* > > *at > org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)* > > *at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:172)* > > *at > org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSy > stem.java:67) > * > > *at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339)* > > *at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)* > > *at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)* > > *at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)* > > *at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)* > > *at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:354)* > > *... 11 more* > > *Caused by: java.net.ConnectException: Connection refused* > > *at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)* > > *at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)* > > *at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)* > > *at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:301)* > > *at > org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:178)* > > *at org.apache.hadoop.ipc.Client.getConnection(Client.java:820)* > > *at org.apache.hadoop.ipc.Client.call(Client.java:705)* > > *... 23 more* > > > > > * Questions* > > 1. > > Can someone help me in solving/debugging this problem? > > P.S: I have tried to stop the HDFS with bin/stop-dfs.sh before > > starting new ones. > > > Thank you, > > C.J >
