You can check the logs whose tasktracker isn't up. The path is "HADOOP_HOME/logs/". The answer may be in it.
2011/3/2 bikash sharma <[email protected]> > Hi Sonal, > Thanks. I guess you are right. ps -ef exposes such processes. > > -bikash > > On Tue, Mar 1, 2011 at 1:29 PM, Sonal Goyal <[email protected]> wrote: > > > Bikash, > > > > I have sometimes found hanging processes which jps does not report, but a > > ps -ef shows them. Maybe you can check this on the errant nodes.. > > > > Thanks and Regards, > > Sonal > > <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data Integration< > https://github.com/sonalgoyal/hiho> > > Nube Technologies <http://www.nubetech.co> > > > > <http://in.linkedin.com/in/sonalgoyal> > > > > > > > > > > > > > > On Tue, Mar 1, 2011 at 7:37 PM, bikash sharma <[email protected] > >wrote: > > > >> Hi James, > >> Sorry for the late response. No, the same problem persists. I > reformatted > >> HDFS, stopped mapred and hdfs daemons and restarted them (using > >> start-dfs.sh > >> and start-mapred.sh from master node). But surprisingly out of 4 nodes > >> cluster, two nodes have TaskTracker running while other two do not have > >> TaskTrackers on them (verified using jps). I guess since I have the > Hadoop > >> installed on shared storage, that might be the issue? Btw, how do I > start > >> the services independently on each node? > >> > >> -bikash > >> On Sun, Feb 27, 2011 at 11:05 PM, James Seigel <[email protected]> wrote: > >> > >> > .... Did you get it working? What was the fix? > >> > > >> > Sent from my mobile. Please excuse the typos. > >> > > >> > On 2011-02-27, at 8:43 PM, Simon <[email protected]> wrote: > >> > > >> > > Hey Bikash, > >> > > > >> > > Maybe you can manually start a tasktracker on the node and see if > >> there > >> > are > >> > > any error messages. Also, don't forget to check your configure files > >> for > >> > > mapreduce and hdfs and make sure datanode can start successfully > >> first. > >> > > After all these steps, you can submit a job on the master node and > see > >> if > >> > > there are any communication between these failed nodes and the > master > >> > node. > >> > > Post your error messages here if possible. > >> > > > >> > > HTH. > >> > > Simon - > >> > > > >> > > On Sat, Feb 26, 2011 at 10:44 AM, bikash sharma < > >> [email protected] > >> > >wrote: > >> > > > >> > >> Thanks James. Well all the config. files and shared keys are on a > >> shared > >> > >> storage that is accessed by all the nodes in the cluster. > >> > >> At times, everything runs fine on initialization, but at other > times, > >> > the > >> > >> same problem persists, so was bit confused. > >> > >> Also, checked the TaskTracker logs on those nodes, there does not > >> seem > >> > to > >> > >> be > >> > >> any error. > >> > >> > >> > >> -bikash > >> > >> > >> > >> On Sat, Feb 26, 2011 at 10:30 AM, James Seigel <[email protected]> > >> wrote: > >> > >> > >> > >>> Maybe your ssh keys aren’t distributed the same on each machine or > >> the > >> > >>> machines aren’t configured the same? > >> > >>> > >> > >>> J > >> > >>> > >> > >>> > >> > >>> On 2011-02-26, at 8:25 AM, bikash sharma wrote: > >> > >>> > >> > >>>> Hi, > >> > >>>> I have a 10 nodes Hadoop cluster, where I am running some > >> benchmarks > >> > >> for > >> > >>>> experiments. > >> > >>>> Surprisingly, when I initialize the Hadoop cluster > >> > >>>> (hadoop/bin/start-mapred.sh), in many instances, only some nodes > >> have > >> > >>>> TaskTracker process up (seen using jps), while other nodes do not > >> have > >> > >>>> TaskTrackers. Could anyone please explain? > >> > >>>> > >> > >>>> Thanks, > >> > >>>> Bikash > >> > >>> > >> > >>> > >> > >> > >> > > > >> > > > >> > > > >> > > -- > >> > > Regards, > >> > > Simon > >> > > >> > > > > >
