I would try to debug this as a network problem - when the namenode is
running, can you connect to 192.168.1.179:9000 from the machine the datanode
is on?

While the namenode does use a lot of RAM as the cluster size increases, an
overloaded namenode will typically start panicking in its log messages.
This doesn't occur in your namenode logs - it doesn't appear any datanodes
connected at all.

-Michael

On 5/10/07 7:39 PM, "Cedric Ho" <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> We were trying to setup hadoop in our linux environments. When we
> tried to use a slow machine as the namenode (some Pentium III machine
> with 512Mb ram). It seems that it was unable to accept connection from
> other datanodes. (I can access its status from http at port 50070
> however).
> 
> But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
> The settings, etc are exactly the same.
> 
> The problem seems to be on the namenode. Is it because the machine is slow ?
> 
> The version we use is 0.12.3
> 
> Any help is appreciated.
> 
> 
> Following is the log from the abnormal namenode.
> 
> 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Network topology has 0 racks and 0 datanodes
> 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
> 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
> Resource aliases
> 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
> [EMAIL PROTECTED]
> 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
> WebApplicationContext[/,/]
> 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> HttpContext[/logs,/logs]
> 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> HttpContext[/static,/static]
> 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
> SocketListener on 0.0.0.0:50070
> 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
> [EMAIL PROTECTED]
> 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 9000: starting
> 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 9000: starting
> 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 9000: starting
> 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 9000: starting
> 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 9000: starting
> 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 4 on 9000: starting
> 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000: starting
> 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 9000: starting
> 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7 on 9000: starting
> 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 9000: starting
> 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 9000: starting
> 
> 
> And these are from the datanode
> 
> 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 1 time(s).
> 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 2 time(s).
> 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 3 time(s).
> 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 4 time(s).
> 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 5 time(s).
> 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 6 time(s).
> 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 7 time(s).
> 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 8 time(s).
> 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 9 time(s).
> 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 10 time(s).
> 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
> hadoop01.ourcompany.com/192.168.1.179:9000 not available yet, Zzzzz...
> 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 1 time(s).
> 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 2 time(s).
> 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 3 time(s).
> 
> 
> Thanks,
> Cedric

Reply via email to