This bug is driving me crazy! What tools could I use to find out why slaves are not reported being part of the cluster? I can't find anything wrong in the log files.
Using Wireshark, I confirmed that the heartbeat in between the slaves and the master is working. The ssh communication in between the master and the slaves is also working fine. It's like everything is perfect... except for the fact that it's not... The admin pages keep reporting that there's only one node in the cluster (the master acts as a slave too and that one is working). Maybe the problem is with the admin pages... Also, I'm seeing VERY slow transfer rates and I don't see what could cause that either... Any idea someone? -----Original Message----- From: Sebastien Rainville [mailto:[EMAIL PROTECTED] Sent: November 10, 2007 2:18 PM To: [email protected] Subject: cluster startup problem Hi, I have a cluster made of only 2 PCs. The master acts also as a slave. The cluster seems to start properly. It is functional (I can access the dfs, monitor it with the web interfaces, no errors in the log files...) but it reports that only 1 node is up. For some reason the datanode on the slave doesn't start properly. The weirdest thing is that it is actually listed in the running processes when I run the command 'jps' and the log file for the datanode exist but is empty... Another weird thing is that the file hadoop.log is empty on the master. So, I can't find any debugging information. Also, I don't know what to think about the tasktracker on the slave... the log file seems fine (reporting that it is starting properly) but I can't open the admin page in a browser. I have another question... what is required for a client application to connect to the cluster? I thought that all I needed was a custom hadoop-site.xml placed in the classpath but it doesn't work. Thanks, Sebastien
