Just in case anyone else was seeing a similar problem, this issue was resolved by removing the loopback addresses from the /ets/hosts files. Seems to a problem on Ubuntu.
Cheers, Ronan On Mon, Jul 16, 2012 at 9:22 PM, Ronan Lehane <ronan.leh...@gmail.com>wrote: > Thanks for the quick reply Harsh. > I think you may be onto something with the second suggestion. > > I found an earlier thread saying that some of the troubleshooting steps > outlined below resolved a similar issue for that person: > http://wiki.apache.org/hadoop/Hbase/Troubleshooting > > Like you suggested, the /etc/hosts file definitely looks to be involved as > I hit different issues depending on what hostnames are set against the > loopback addresses. > I'll try reset them to see if it resolves the issue. > > Thanks, > Ronan > > > > On Mon, Jul 16, 2012 at 7:44 PM, Harsh J <ha...@cloudera.com> wrote: > >> Ronan, >> >> A couple of simple things to ensure first: >> >> 1. Make sure the firewall isn't the one at fault here. Best to disable >> firewall if you do not need it, or carefully configure the rules to >> allow in/out traffic over chosen ports. >> 2. Ensure that the hostnames fs.default.name and mapred.job.tracker >> bind to, are external IP-resolving hostnames and not localhost >> (loopback interface bound) addresses. >> >> On Tue, Jul 17, 2012 at 12:05 AM, Ronan Lehane <ronan.leh...@gmail.com> >> wrote: >> > Hi All, >> > >> > I was wondering if anyone could help me figure out what's going wrong >> in my >> > five node Hadoop cluster, please? >> > >> > It consists of: >> > 1. NameNode >> > hduser@namenode:/usr/local/hadoop$ jps >> > 13049 DataNode >> > 13387 Jps >> > 12740 NameNode >> > 13316 SecondaryNameNode >> > >> > 2. JobTracker >> > hduser@jobtracker:/usr/local/hadoop$ jps >> > 21817 TaskTracker >> > 21448 DataNode >> > 21542 JobTracker >> > 21862 Jps >> > >> > 3. Slave1 >> > hduser@slave1:/usr/local/hadoop$ jps >> > 21226 DataNode >> > 21514 Jps >> > 21463 TaskTracker >> > >> > 4. Slave2 >> > hduser@slave2:/usr/local/hadoop$ jps >> > 20938 Jps >> > 20650 DataNode >> > 20887 TaskTracker >> > >> > 5. Slave3 >> > hduser@slave3:/usr/local/hadoop$ jps >> > 22145 Jps >> > 21854 DataNode >> > 22091 TaskTracker >> > >> > All DataNodes have been kicked off by running start-dfs.sh on the >> NameNode >> > All TaskTrackers have been kicked off by running start-mapred.sh on the >> > JobTracker >> > >> > When I try to execute a simple wordcount job from the NameNode I receive >> > the following error: >> > 12/07/16 19:25:22 ERROR security.UserGroupInformation: >> > PriviledgedActionException as:hduser cause:java.net.ConnectException: >> Call >> > to jobtracker/10.21.68.218:54311 failed on connection exception: >> > java.net.ConnectException: Connection refused >> > >> > If I check the jobtracker: >> > 1. I can ping in both directions by both IP and Hostname >> > 2. I can see that the jobtracker is listening on port 54311 >> > tcp 0 0 127.0.0.1:54311 0.0.0.0:* >> > LISTEN 1001 425093 21542/java >> > 3. Telnet to this port from the NameNode fails with "Connection Refused" >> > telnet: Unable to connect to remote host: Connection refused >> > >> > This issue can be worked around by moving the JobTracker functionality >> to >> > the NameNode, but when this is done the job is executed on the NameNode >> > rather than distributed across the cluster. >> > Checking the log files on the slaves nodes, I see Server Not Available >> > messages referenced at the below wiki. >> > http://wiki.apache.org/hadoop/ServerNotAvailable >> > The Data Nodes not seeing the NameNode and the Task Trackers not seeing >> > JobTracker. >> > Checking the JobTracker web interface, it always states there is only 1 >> > node available. >> > >> > I've checked the 5 troubleshooting steps provided but it all looks to >> be ok >> > in my environment. >> > >> > Would anyone have any idea's of what could be causing this? >> > Any help would be appreciated. >> > >> > Cheers, >> > Ronan >> >> >> >> -- >> Harsh J >> > >