Hello guys, I'm requesting from a PBS scheduler a number of machines to run Hadoop and even though all hadoop daemons start normally on the master and slaves, the slaves don't have worker tasks in them. Digging into that, there seems to be some blocking between nodes (?) don't know how to describe it except that on slave if I "telnet master-node" it should be able to connect, but I get this error:
[mark@node67 ~]$ telnet node77 Trying 192.168.1.77... telnet: connect to address 192.168.1.77: Connection refused telnet: Unable to connect to remote host: Connection refused The log at the slave nodes shows the same thing, even though it has datanode and tasktracker started from the maste (?): 2012-01-09 10:04:03,436 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 0 time(s). 2012-01-09 10:04:04,439 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 1 time(s). 2012-01-09 10:04:05,442 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 2 time(s). 2012-01-09 10:04:06,444 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 3 time(s). 2012-01-09 10:04:07,446 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 4 time(s). 2012-01-09 10:04:08,448 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 5 time(s). 2012-01-09 10:04:09,450 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 6 time(s). 2012-01-09 10:04:10,452 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 7 time(s). 2012-01-09 10:04:11,454 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 8 time(s). 2012-01-09 10:04:12,456 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:12123. Already tried 9 time(s). 2012-01-09 10:04:12,456 INFO org.apache.hadoop.ipc.RPC: Server at localhost/ 127.0.0.1:12123 not available yet, Zzzzz... Any suggestions of what I can do? Thanks, Mark
