Re: High load on datanode startup

Darrell Taylor Wed, 09 May 2012 14:28:25 -0700

On Wed, May 9, 2012 at 10:00 PM, Serge Blazhiyevskyy <
serge.blazhiyevs...@nice.com> wrote:


> Looks like you have some under replicated blocks. Does that number
> decreases if you fsck multiple times?
>

Yes, since my last post it's now down to 353....

Status: HEALTHY
 Total size:    246983628437 B (Total open files size: 372 B)
 Total dirs:    15172
 Total files:   39637 (Files currently being written: 7)
 Total blocks (validated):      41046 (avg. block size 6017239 B) (Total
open file blocks (not validated): 6)
 Minimally replicated blocks:   41046 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       353 (0.86001074 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.016981
 Corrupt blocks:                0
 Missing replicas:              1774 (1.4325514 %)
 Number of data-nodes:          5
 Number of racks:               1
FSCK ended at Wed May 09 21:26:40 UTC 2012 in 904 milliseconds




>
>
> Regards,
> Serge
>
> On 5/9/12 12:23 PM, "Darrell Taylor" <darrell.tay...@gmail.com> wrote:
>
> >On Wed, May 9, 2012 at 6:04 PM, Serge Blazhiyevskyy <
> >serge.blazhiyevs...@nice.com> wrote:
> >
> >>
> >> Whats the response from fsck look like?
> >>
> >>
> >[snip lots of stuff about under replicated blocks]
> >
> >......Status: HEALTHY
> > Total size:    246858876262 B (Total open files size: 372 B)
> > Total dirs:    14914
> > Total files:   39248 (Files currently being written: 4)
> > Total blocks (validated):      40657 (avg. block size 6071743 B) (Total
> >open file blocks (not validated): 4)
> > Minimally replicated blocks:   40657 (100.0 %)
> > Over-replicated blocks:        0 (0.0 %)
> > Under-replicated blocks:       1410 (3.4680374 %)
> > Mis-replicated blocks:         0 (0.0 %)
> > Default replication factor:    3
> > Average block replication:     2.9911454
> > Corrupt blocks:                0
> > Missing replicas:              2831 (2.3279145 %)
> > Number of data-nodes:          5
> > Number of racks:               1
> >FSCK ended at Wed May 09 19:19:11 UTC 2012 in 980 milliseconds
> >
> >
> >Further information to add to this, it appear to be affecting 2 nodes in
> >the cluster, one more than the other though.  In the last couple of hours
> >one of the nodes has also experienced high load, this has now dropped but
> >both of these nodes are now considered dead by the namenode.  The first
> >box
> >load is still increasing, currently 234! I think I might have to reboot it
> >via IPMI.
> >
> >
> >>
> >> hadoop fsck /
> >>
> >>
> >> It might be the case that some of the blocks are misreplicated
> >>
> >>
> >> Serge
> >>
> >> Hadoopway.blogspot.com
> >>
> >>
> >>
> >>
> >>
> >> On 5/9/12 9:58 AM, "Darrell Taylor" <darrell.tay...@gmail.com> wrote:
> >>
> >> >On Wed, May 9, 2012 at 5:56 PM, Serge Blazhiyevskyy <
> >> >serge.blazhiyevs...@nice.com> wrote:
> >> >
> >> >> Take a look at your data distribution for that cluster. Maybe, it is
> >> >> unbalanced.
> >> >>
> >> >>
> >> >> Run balancer, if it isŠ
> >> >>
> >> >
> >> >The cluster is balanced, I ran balancer yesterday.  Oddly enough the
> >> >problem started after I had run the balancer.
> >> >
> >> >I'm running CDH3 btw.
> >> >
> >> >
> >> >
> >> >>
> >> >> Regards,
> >> >> Serge
> >> >>
> >> >> hadoopway.blogspot.com
> >> >>
> >> >>
> >> >>
> >> >> On 5/9/12 9:52 AM, "Darrell Taylor" <darrell.tay...@gmail.com>
> wrote:
> >> >>
> >> >> >Hi,
> >> >> >
> >> >> >I wonder if someone could give some pointers with a problem I'm
> >>having?
> >> >> >
> >> >> >I have a 7 machine cluster setup for testing and we have been
> >>pouring
> >> >>data
> >> >> >into it for a week without issue, have learnt several thing along
> >>the
> >> >>way
> >> >> >and solved all the problems up to now by searching online, but now
> >>I'm
> >> >> >stuck.  One of the data nodes decided to have a load of 70+ this
> >> >>morning,
> >> >> >stopping datanode and tasktracker brought it back to normal, but
> >>every
> >> >> >time
> >> >> >I start the datanode again the load shoots through the roof, and
> >>all I
> >> >>get
> >> >> >in the logs is :
> >> >> >
> >> >> >STARTUP_MSG: Starting DataNode
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   host = pl464/10.20.16.64
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   args = []
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   version = 0.20.2-cdh3u3
> >> >> >
> >> >> >
> >> >> >STARTUP_MSG:   build =
> >> >>
> >>
> >>>>>file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+92
> >>>>>3.
> >> >>>19
> >> >> >7-1~squeeze
> >> >> >-************************************************************/
> >> >> >
> >> >> >
> >> >> >2012-05-09 16:12:05,925 INFO
> >> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
> >> >> >already
> >> >> >set up for Hadoop, not re-installing.
> >> >> >
> >> >> >2012-05-09 16:12:06,139 INFO
> >> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
> >> >> >already
> >> >> >set up for Hadoop, not re-installing.
> >> >> >
> >> >> >Nothing else.
> >> >> >
> >> >> >The load seems to max out only 1 of the CPUs, but the machine
> >>becomes
> >> >> >*very* unresponsive
> >> >> >
> >> >> >Anybody got any pointers of things I can try?
> >> >> >
> >> >> >Thanks
> >> >> >Darrell.
> >> >>
> >> >>
> >>
> >>
>
>

Re: High load on datanode startup

Reply via email to