Joe, We'll need to learn what happened to that region, they usually don't throw up after a few inserts ;)
So in that region server's log, before you tried disabling that table, do you see anything wrong (exceptions probably)? If you have a web server, it would be nice to drop the full RS log and the master log. thx! J-D On Wed, Mar 10, 2010 at 5:54 PM, Joe Pepersack <j...@pepersack.net> wrote: > On 03/10/2010 07:58 PM, Jean-Daniel Cryans wrote: >> >> Which HBase version? What's your hardware like? How much data were you >> inserting? Did you grep the region server logs for any IOException or >> such? Can we see an excerpt of those logs around the time of the "lock >> up"? >> > > Version: 0.20.3-1.cloudera > Hardware: dual Xeon 4 core, 16G, 1.7T disk > 10x nodes: 1 master, 1 secondary master, 8x regionservers. 2x zookeepers > running on regionservers > > > It appears to have died after only a few rows were inserted. There's only > one region shown on the status page. Curiously, that region does NOT show > up in the list of online regions for the listed regionserver. > > Master log, from the point where I ran "drop 'Person'" in the shell: > > 010-03-10 20:44:44,812 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scanning meta region {server: 10.40.0.37:60020, > regionname: -ROOT-,,0, startKey:<>} > 2010-03-10 20:44:44,815 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > 10.40.0.37:60020, regionname: -ROOT-,,0, startKey:<>} complete > 2010-03-10 20:44:44,836 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scanning meta region {server: 10.40.0.36:60020, > regionname: .META.,,1, startKey:<>} > 2010-03-10 20:44:44,844 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scan of 3 row(s) of meta region {server: > 10.40.0.36:60020, regionname: .META.,,1, startKey:<>} complete > 2010-03-10 20:44:44,844 INFO org.apache.hadoop.hbase.master.BaseScanner: All > 1 .META. region(s) scanned > 2010-03-10 20:44:45,357 INFO org.apache.hadoop.hbase.master.ServerManager: 5 > region servers, 0 dead, average load 1.2 > 2010-03-10 20:45:03,209 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions > 2010-03-10 20:45:03,209 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing regions > currently being served > 2010-03-10 20:45:03,210 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Adding region > Person,,1268251509658 to setClosing list > 2010-03-10 20:45:04,260 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions > 2010-03-10 20:45:04,260 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing regions > currently being served > 2010-03-10 20:45:04,260 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Adding region > Person,,1268251509658 to setClosing list > 2010-03-10 20:45:05,273 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions > 2010-03-10 20:45:05,273 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing regions > currently being served > 2010-03-10 20:45:05,273 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Adding region > Person,,1268251509658 to setClosing list > 2010-03-10 20:45:06,287 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions > 2010-03-10 20:45:06,287 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing regions > currently being served > 2010-03-10 20:45:06,287 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Adding region > Person,,1268251509658 to setClosing list > 2010-03-10 20:45:08,301 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions > 2010-03-10 20:45:08,301 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Processing regions > currently being served > 2010-03-10 20:45:08,301 DEBUG > org.apache.hadoop.hbase.master.ChangeTableState: Adding region > Person,,1268251509658 to setClosing list > > > Log from the region server where the region is supposed to be for the same > time frame: > > 2010-03-10 20:43:50,889 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=1.6213074MB (1700064), Free=195.8787MB (205393696), Max=197.5MB > (207093760), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0, > Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted/Run=NaN > 2010-03-10 20:44:50,889 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=1.6213074MB (1700064), Free=195.8787MB (205393696), Max=197.5MB > (207093760), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0, > Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted/Run=NaN > 2010-03-10 20:45:04,058 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: > Person,,1268251509658 > 2010-03-10 20:45:04,059 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE: Person,,1268251509658 > 2010-03-10 20:45:05,062 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: > Person,,1268251509658 > 2010-03-10 20:45:05,063 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE: Person,,1268251509658 > 2010-03-10 20:45:06,066 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: > Person,,1268251509658 > 2010-03-10 20:45:06,067 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE: Person,,1268251509658 > 2010-03-10 20:45:07,070 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: > Person,,1268251509658 > 2010-03-10 20:45:07,071 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE: Person,,1268251509658 > 2010-03-10 20:45:09,079 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: > Person,,1268251509658 > 2010-03-10 20:45:09,079 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE: Person,,1268251509658 > 2010-03-10 20:45:11,088 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: > Person,,1268251509658 > 2010-03-10 20:45:11,088 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE: Person,,1268251509658 > 2010-03-10 20:45:15,104 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: > Person,,1268251509658 > 2010-03-10 20:45:15,105 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_CLOSE: Person,,1268251509658 > 2010-03-10 20:45:50,889 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=1.6213074MB (1700064), Free=195.8787MB (205393696), Max=197.5MB > (207093760), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0, > Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted/Run=NaN > > > >