Hi All, I was wondering if anyone could help me figure out what's going wrong in my five node Hadoop cluster, please?
It consists of: 1. NameNode hduser@namenode:/usr/local/hadoop$ jps 13049 DataNode 13387 Jps 12740 NameNode 13316 SecondaryNameNode 2. JobTracker hduser@jobtracker:/usr/local/hadoop$ jps 21817 TaskTracker 21448 DataNode 21542 JobTracker 21862 Jps 3. Slave1 hduser@slave1:/usr/local/hadoop$ jps 21226 DataNode 21514 Jps 21463 TaskTracker 4. Slave2 hduser@slave2:/usr/local/hadoop$ jps 20938 Jps 20650 DataNode 20887 TaskTracker 5. Slave3 hduser@slave3:/usr/local/hadoop$ jps 22145 Jps 21854 DataNode 22091 TaskTracker All DataNodes have been kicked off by running start-dfs.sh on the NameNode All TaskTrackers have been kicked off by running start-mapred.sh on the JobTracker When I try to execute a simple wordcount job from the NameNode I receive the following error: 12/07/16 19:25:22 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.net.ConnectException: Call to jobtracker/10.21.68.218:54311 failed on connection exception: java.net.ConnectException: Connection refused If I check the jobtracker: 1. I can ping in both directions by both IP and Hostname 2. I can see that the jobtracker is listening on port 54311 tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 425093 21542/java 3. Telnet to this port from the NameNode fails with "Connection Refused" telnet: Unable to connect to remote host: Connection refused This issue can be worked around by moving the JobTracker functionality to the NameNode, but when this is done the job is executed on the NameNode rather than distributed across the cluster. Checking the log files on the slaves nodes, I see Server Not Available messages referenced at the below wiki. http://wiki.apache.org/hadoop/ServerNotAvailable The Data Nodes not seeing the NameNode and the Task Trackers not seeing JobTracker. Checking the JobTracker web interface, it always states there is only 1 node available. I've checked the 5 troubleshooting steps provided but it all looks to be ok in my environment. Would anyone have any idea's of what could be causing this? Any help would be appreciated. Cheers, Ronan