I looked at the commits on trunk, nothing new recently. Some weird corruption and scanner errors in trunk.... nuking /hbase and restarting fixed it, something wrong with the .META. table obviously.
Looks like what is happening is findClosestBefore() returns a 'empty' RowResult, with absolutely no columns in it, futhermore, the row id doesnt appear in my 'region list' Web-UI. So it's not an active real alive region, it's some other artifact that is still hanging out. Maybe it's a phantom delete showing up as an entry. I'm not sure it's worthwhile debugging until after HBASE-1234 comes out. After all the buggy code is probably being substantially reworked and/or removed. -ryan On Sat, Apr 4, 2009 at 2:19 AM, Ryan Rawson <[email protected]> wrote: > Hey guys, > > There seems to be something wrong on trunk... I used to have long > map-reduce jobs, but now they are failing, unable to commit: > > 2009-04-04 01:17:09,279 DEBUG > org.apache.hadoop.hbase.client.HConnectionManager$TableServers: > locateRegionInMeta attempt 5 of 10 failed; retrying after sleep of 8000 > java.io.IOException: HRegionInfo was null or empty in .META. > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:566) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:515) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:484) > ... etc > > Basically mappers get stuck up on commits and make no progress, mapred > kills them, done. > > I've spent some time banging at it - made sure that ulimit -n is good, set > the ipc handler limit to 30, cranked down the number of maps I'm doing, > etc. To no avail. > > At least I figured out how to debug hadoop jobs a bit. > > Anyone have thoughts? >
