Hi, I am trying to setup Hadoop on a two node cluster, both using Ubuntu 9.10. I have configured one node as NameNode/JobTracker and the other as DataNode/TaskTracker.
I have the following in the hosts file for the master and the slave master> cat /etc/hosts 192.168.0.100 master 192.168.0.102 slave 127.0.0.1 localhost slave> cat /etc/hosts 192.168.0.100 master 192.168.0.102 slave 127.0.0.1 localhosts and the configuration file on the master and the slave has master -> core-site.xml -> fs.default.name->hdfs://localhost:9050 -> hdfs-site.xml -> dfs.replication->1 -> mapred-site.xml -> mapred.job.tracker->localhost:9001 slave -> core-site.xml -> fs.default.name->hdfs://master:9050 -> hdfs-site.xml -> dfs.replication->1 -> mapred-site.xml -> mapred.job.tracker->master:9001 When I run the command start-dfs.sh, the NameNode starts without any errors and the script tries to start the DataNode. But, the DataNode is not able to connect to the MasterNode. The following is in the hadoop-praveensripati-datanode-slave.log file 2010-04-02 06:54:35,630 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.0.100:9050. Already tried 9 time(s). 2010-04-02 06:54:35,645 INFO org.apache.hadoop.ipc.RPC: Server at master/ 192.168.0.100:9050 not available yet, Zzzzz... 1. Able to ping the master from the slave and the other way. 2. Able to ssh into slave from master and other way. 3. Disabled ipv6 on master and slave. /etc/sysctl.conf has net.ipv6.conf.all.disable_ipv6 = 1. I wrote a Java SocketClient Program to connect from the DataNode to the NameNode at port 9050 and I get the following exception java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at SocketClient.main(SocketClient.java:23) Then, I stop the NameNode and DataNode and then by using Java Programs I create a socket (at 9050) on the NameNode and am able to connect from the DataNode using Java Program. ServerSocket.java has int port = Integer.parseInt(args[0]); ServerSocket srv = new ServerSocket(port); Socket socket = srv.accept(); SocketClient.java has InetAddress addr = InetAddress.getByName(args[0]); int port = Integer.parseInt(args[1]); SocketAddress sockaddr = new InetSocketAddress(addr, port); Socket sock = new Socket(); int timeoutMs = 2000; sock.connect(sockaddr, timeoutMs); When I do 'netstat -a | grep 9050' I get When NameNode creates the Socket -> tcp 0 0 localhost:9050 *:* LISTEN When Java Program creates a Socket -> tcp 0 0 *:9050 *:* LISTEN Why is that the DataNode not able to Connect at port 9050 on the NameNode, while the SocketClient.java connects to SocketServer.java on port 9050? Is there anything different that the NameNode creates a socket? -- Praveen