On Thu, May 10, 2012 at 9:33 AM, Todd Lipcon <t...@cloudera.com> wrote:

> That's real weird..
>
> If you can reproduce this after a reboot, I'd recommend letting the DN
> run for a minute, and then capturing a "jstack <pid of dn>" as well as
> the output of "top -H -p <pid of dn> -b -n 5" and send it to the list.


What I did after the reboot this morning was to move the my dn, nn, and
mapred directories out of the the way, create a new one, formatted it, and
restarted the node, it's now happy.

I'll try moving the directories back later and do the jstack as you suggest.


>
> What JVM/JDK are you using? What OS version?
>

root@pl446:/# dpkg --get-selections | grep java
java-common                                     install
libjaxp1.3-java                                 install
libjaxp1.3-java-gcj                             install
libmysql-java                                   install
libxerces2-java                                 install
libxerces2-java-gcj                             install
sun-java6-bin                                   install
sun-java6-javadb                                install
sun-java6-jdk                                   install
sun-java6-jre                                   install

root@pl446:/# java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

root@pl446:/# cat /etc/issue
Debian GNU/Linux 6.0 \n \l



>
> -Todd
>
>
> On Wed, May 9, 2012 at 11:57 PM, Darrell Taylor
> <darrell.tay...@gmail.com> wrote:
> > On Wed, May 9, 2012 at 10:52 PM, Raj Vishwanathan <rajv...@yahoo.com>
> wrote:
> >
> >> The picture either too small or too pixelated for my eyes :-)
> >>
> >
> > There should be a zoom option in the top right of the page that allows
> you
> > to view it full size
> >
> >
> >>
> >> Can you login to the box and send the output of top? If the system is
> >> unresponsive, it has to be something more than an unbalanced hdfs
> cluster,
> >> methinks.
> >>
> >
> > Sorry, I'm unable to login to the box, it's completely unresponsive.
> >
> >
> >>
> >> Raj
> >>
> >>
> >>
> >> >________________________________
> >> > From: Darrell Taylor <darrell.tay...@gmail.com>
> >> >To: common-user@hadoop.apache.org; Raj Vishwanathan <rajv...@yahoo.com
> >
> >> >Sent: Wednesday, May 9, 2012 2:40 PM
> >> >Subject: Re: High load on datanode startup
> >> >
> >> >On Wed, May 9, 2012 at 10:23 PM, Raj Vishwanathan <rajv...@yahoo.com>
> >> wrote:
> >> >
> >> >> When you say 'load', what do you mean? CPU load or something else?
> >> >>
> >> >
> >> >I mean in the unix sense of load average, i.e. top would show a load of
> >> >(currently) 376.
> >> >
> >> >Looking at Ganglia stats for the box it's not CPU load as such, the
> graphs
> >> >shows actual CPU usage as 30%, but the number of running processes is
> >> >simply growing in a linear manner - screen shot of ganglia page here :
> >> >
> >> >
> >>
> https://picasaweb.google.com/lh/photo/Q0uFSzyLiriDuDnvyRUikXVR0iWwMibMfH0upnTwi28?feat=directlink
> >> >
> >> >
> >> >
> >> >>
> >> >> Raj
> >> >>
> >> >>
> >> >>
> >> >> >________________________________
> >> >> > From: Darrell Taylor <darrell.tay...@gmail.com>
> >> >> >To: common-user@hadoop.apache.org
> >> >> >Sent: Wednesday, May 9, 2012 9:52 AM
> >> >> >Subject: High load on datanode startup
> >> >> >
> >> >> >Hi,
> >> >> >
> >> >> >I wonder if someone could give some pointers with a problem I'm
> having?
> >> >> >
> >> >> >I have a 7 machine cluster setup for testing and we have been
> pouring
> >> data
> >> >> >into it for a week without issue, have learnt several thing along
> the
> >> way
> >> >> >and solved all the problems up to now by searching online, but now
> I'm
> >> >> >stuck.  One of the data nodes decided to have a load of 70+ this
> >> morning,
> >> >> >stopping datanode and tasktracker brought it back to normal, but
> every
> >> >> time
> >> >> >I start the datanode again the load shoots through the roof, and
> all I
> >> get
> >> >> >in the logs is :
> >> >> >
> >> >> >STARTUP_MSG: Starting DataNode
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   host = pl464/10.20.16.64
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   args = []
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   version = 0.20.2-cdh3u3
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   build =
> >> >>
> >> >>
> >>
> >file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+923.197-1~squeeze
> >> >> >-************************************************************/
> >> >> >
> >> >> >
> >> >> >2012-05-09 16:12:05,925 INFO
> >> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
> >> >> already
> >> >> >set up for Hadoop, not re-installing.
> >> >> >
> >> >> >2012-05-09 16:12:06,139 INFO
> >> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
> >> >> already
> >> >> >set up for Hadoop, not re-installing.
> >> >> >
> >> >> >Nothing else.
> >> >> >
> >> >> >The load seems to max out only 1 of the CPUs, but the machine
> becomes
> >> >> >*very* unresponsive
> >> >> >
> >> >> >Anybody got any pointers of things I can try?
> >> >> >
> >> >> >Thanks
> >> >> >Darrell.
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to