Whats the response from fsck look like?
hadoop fsck / It might be the case that some of the blocks are misreplicated Serge Hadoopway.blogspot.com On 5/9/12 9:58 AM, "Darrell Taylor" <darrell.tay...@gmail.com> wrote: >On Wed, May 9, 2012 at 5:56 PM, Serge Blazhiyevskyy < >serge.blazhiyevs...@nice.com> wrote: > >> Take a look at your data distribution for that cluster. Maybe, it is >> unbalanced. >> >> >> Run balancer, if it isÅ >> > >The cluster is balanced, I ran balancer yesterday. Oddly enough the >problem started after I had run the balancer. > >I'm running CDH3 btw. > > > >> >> Regards, >> Serge >> >> hadoopway.blogspot.com >> >> >> >> On 5/9/12 9:52 AM, "Darrell Taylor" <darrell.tay...@gmail.com> wrote: >> >> >Hi, >> > >> >I wonder if someone could give some pointers with a problem I'm having? >> > >> >I have a 7 machine cluster setup for testing and we have been pouring >>data >> >into it for a week without issue, have learnt several thing along the >>way >> >and solved all the problems up to now by searching online, but now I'm >> >stuck. One of the data nodes decided to have a load of 70+ this >>morning, >> >stopping datanode and tasktracker brought it back to normal, but every >> >time >> >I start the datanode again the load shoots through the roof, and all I >>get >> >in the logs is : >> > >> >STARTUP_MSG: Starting DataNode >> > >> > >> >STARTUP_MSG: host = pl464/10.20.16.64 >> > >> > >> >STARTUP_MSG: args = [] >> > >> > >> >STARTUP_MSG: version = 0.20.2-cdh3u3 >> > >> > >> >STARTUP_MSG: build = >> >>>file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+923. >>>19 >> >7-1~squeeze >> >-************************************************************/ >> > >> > >> >2012-05-09 16:12:05,925 INFO >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration >> >already >> >set up for Hadoop, not re-installing. >> > >> >2012-05-09 16:12:06,139 INFO >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration >> >already >> >set up for Hadoop, not re-installing. >> > >> >Nothing else. >> > >> >The load seems to max out only 1 of the CPUs, but the machine becomes >> >*very* unresponsive >> > >> >Anybody got any pointers of things I can try? >> > >> >Thanks >> >Darrell. >> >>