Re: NotServingRegionException - Map/Reduce process fails

stack Thu, 23 Oct 2008 11:31:48 -0700

Find the MR task that failed. Click through the UI to look at itslogs. It may have interesting info. Its probably complaining about aregion not being available (NSRE). Figure which region it is. Use theregion historian or grep in the master logs -- 'grep -v metaScannerREGIONNAME' so you avoid the metaScanner noise -- to see if you canfigure the regions history around the failure. Look too at loadingaround failure time. Were you swapping, etc. (Ganglia or some suchhelps here).

You might also test table is still wholesome -- that the MR job didn'tdamage the table. A quick check that all regions are onlined andaccessible is to scan for a column whose column family does exist butwhose qualifier you know is not present: e.g. if you have columnfamily'page' and you know there is no column 'page:xyz', scan with that(Enable DEBUG in log4j so you can see regions being loaded as scanprogresses): "scan 'TABLENAME', ['page:xyz']".


You might need to up the timeouts/retries.
St.Ack


Dru Jensen wrote:

Hi hbase-users,
During a fairly large MR process, on the Reduce cycle as its writingits results to a table, I seeorg.apache.hadoop.hbase.NotServingRegionException in the region serverlog several times and then I see a split reporting it was successful.
Eventually, the Reduce process fails withorg.apache.hadoop.hbase.client.RetriesExhaustedException after 10failed attempts.
What can I do to fix it?

Thanks,
Dru

Re: NotServingRegionException - Map/Reduce process fails

Reply via email to