Hi,

I have a brand new setup of hadoop machine cluster with 50 machines. I see some 
weird issues come up with the cluster with time....Things run just fine for a 
few days and then when I try to run stop-dfs.sh, it says 

no namenode to stop
hadoop-07: no data node to stop
hadoop-08: no data node to stop
.
.
.
hadoop-03: no secondarynamenode to stop

When I go to these machines, the data node is actually running. 

Any idea what can cause issue like this? The last time it happened I killed all 
the running datanodes manually and then started the dfs. It started fine. After 
that even the stop-dfs.sh worked as expected. But now it got back to the same 
situation again.

One more thing I see a lot of left over running "Child" tasks from task 
attempts on these machines.

Appreciate any help.

Thanks,
C


      

Reply via email to