Dru. See also if it's a case of HBASE-921<https://issues.apache.org/jira/browse/HBASE-921>because it would make sense if not using hbase 0.18.1 and under a heavy load.
J-D On Thu, Oct 23, 2008 at 2:30 PM, stack <[EMAIL PROTECTED]> wrote: > Find the MR task that failed. Click through the UI to look at its logs. > It may have interesting info. Its probably complaining about a region not > being available (NSRE). Figure which region it is. Use the region > historian or grep in the master logs -- 'grep -v metaScanner REGIONNAME' so > you avoid the metaScanner noise -- to see if you can figure the regions > history around the failure. Look too at loading around failure time. Were > you swapping, etc. (Ganglia or some such helps here). > > You might also test table is still wholesome -- that the MR job didn't > damage the table. A quick check that all regions are onlined and accessible > is to scan for a column whose column family does exist but whose qualifier > you know is not present: e.g. if you have columnfamily 'page' and you know > there is no column 'page:xyz', scan with that (Enable DEBUG in log4j so you > can see regions being loaded as scan progresses): "scan 'TABLENAME', > ['page:xyz']". > > You might need to up the timeouts/retries. > St.Ack > > > > Dru Jensen wrote: > >> Hi hbase-users, >> >> During a fairly large MR process, on the Reduce cycle as its writing its >> results to a table, I see org.apache.hadoop.hbase.NotServingRegionException >> in the region server log several times and then I see a split reporting it >> was successful. >> >> Eventually, the Reduce process fails with >> org.apache.hadoop.hbase.client.RetriesExhaustedException after 10 failed >> attempts. >> >> What can I do to fix it? >> >> Thanks, >> Dru >> >> >> >> >> >