Hadoop by default places its .pid files in /tmp. On most linux distros, a script will come along and delete stuff from there that hasn't been modified in a while. The startup / shutdown scripts are merely looking for these pid files. The solution is to move the temp files to somewhere sane. Maybe /var/run/hadoop/.
On Jun 1, 2010, at 1:19 PM, C J wrote: > Hi, > > I have a brand new setup of hadoop machine cluster with 50 machines. I see > some weird issues come up with the cluster with time....Things run just fine > for a few days and then when I try to run stop-dfs.sh, it says > > no namenode to stop > hadoop-07: no data node to stop > hadoop-08: no data node to stop > . > . > . > hadoop-03: no secondarynamenode to stop > > When I go to these machines, the data node is actually running. > > Any idea what can cause issue like this? The last time it happened I killed > all the running datanodes manually and then started the dfs. It started fine. > After that even the stop-dfs.sh worked as expected. But now it got back to > the same situation again. > > One more thing I see a lot of left over running "Child" tasks from task > attempts on these machines. > > Appreciate any help. > > Thanks, > C > >