No, the slave that is not appearing is the remote one. In the slaves
file, I use the fqdn for both the local host and the remote one. I added
another slave and it behaves the same way.

Just for testing purpose, I changed the value of fs.default.name in
hadoop-site.xml on the slave to some erroneous value and then I see an
error in its datanode log file like it is supposed to. But when I change
it back to point to the master node then it doesn't log a success
message, it just logs nothing... How can that be?



-----Original Message-----
From: Dennis Kubes [mailto:[EMAIL PROTECTED] 
Sent: November 10, 2007 7:47 PM
To: [email protected]
Subject: Re: cluster startup problem

Is the slave that is NOT appearing the local one (would be my guess). 
In your slaves file do you have it set to localhost or the fqdn of the 
host.  If localhost, in your hosts file do you have localhost setup? 
That is where I would start.

Dennis Kubes

Sebastien Rainville wrote:
> This bug is driving me crazy! What tools could I use to find out why
> slaves are not reported being part of the cluster? I can't find
anything
> wrong in the log files.
> 
> Using Wireshark, I confirmed that the heartbeat in between the slaves
> and the master is working. The ssh communication in between the master
> and the slaves is also working fine.
> 
> It's like everything is perfect... except for the fact that it's
not...
> The admin pages keep reporting that there's only one node in the
cluster
> (the master acts as a slave too and that one is working). Maybe the
> problem is with the admin pages... Also, I'm seeing VERY slow transfer
> rates and I don't see what could cause that either...
> 
> Any idea someone?
> 
> 
> 
> 
> -----Original Message-----
> From: Sebastien Rainville [mailto:[EMAIL PROTECTED] 
> Sent: November 10, 2007 2:18 PM
> To: [email protected]
> Subject: cluster startup problem
> 
> Hi,
> 
>  
> 
> I have a cluster made of only 2 PCs. The master acts also as a slave.
> The cluster seems to start properly. It is functional (I can access
the
> dfs, monitor it with the web interfaces, no errors in the log
files...)
> but it reports that only 1 node is up. For some reason the datanode on
> the slave doesn't start properly. The weirdest thing is that it is
> actually listed in the running processes when I run the command 'jps'
> and the log file for the datanode exist but is empty... Another weird
> thing is that the file hadoop.log is empty on the master. So, I can't
> find any debugging information. Also, I don't know what to think about
> the tasktracker on the slave... the log file seems fine (reporting
that
> it is starting properly) but I can't open the admin page in a browser.
> 
>  
> 
> I have another question... what is required for a client application
to
> connect to the cluster? I thought that all I needed was a custom
> hadoop-site.xml placed in the classpath but it doesn't work.
> 
>  
> 
> Thanks,
> 
> Sebastien
> 
>  
> 

Reply via email to