Hello, last week we went into a strange error and today this happened again. After altering table, enabling table again we got NoSuchColumnFamilyException when working with the table for some regions.
We discovered, that the error itself is caused by some regions assigned to multiple servers and not being really offline when the table was disabled. HMaster claimed that all regions were disabled, but RegionServers held some regions online. This happened on HMaster right after disabling table 'robot': 2010-04-08 14:12:12,127 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region robot,cz.sika.www.\x2180/en/cz-ind/cz-ind-news.htm,1270552929405 to setClosing list 2010-04-08 14:12:13,958 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_CLOSE: robot,cz.sika.www.\x2180/en/cz-ind/cz-ind-news.htm,1270552929405 from fernet7-v49.ng.seznam.cz,60020,1270631603011; 60 of 70 2010-04-08 14:12:28,048 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of robot,cz.sika.www.\x2180/en/cz-ind/cz-ind-news.htm,1270552929405, true, reassign: false 2010-04-08 14:12:28,049 INFO org.apache.hadoop.hbase.master.ProcessRegionClose$1: region closed: robot,cz.sika.www.\x2180/en/cz-ind/cz-ind-news.htm,1270552929405 2010-04-08 14:12:28,054 DEBUG org.apache.hadoop.hbase.master.BaseScanner: GET on robot,cz.sika.www.\x2180/en/cz-ind/cz-ind-news.htm,1270552929405 got different startcode than SCAN: sc=0, serverAddress=1270631603011 2010-04-08 14:12:28,054 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of robot,cz.sika.www.\x2180/en/cz-ind/cz-ind-news.htm,1270552929405 is not valid; serverAddress=, startCode=0 unknown. 2010-04-08 14:12:28,240 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region robot,cz.sika.www.\x2180/en/cz-ind/cz-ind-news.htm,1270552929405 to fernet1-v49.ng.seznam.cz,60020,1270540750568 Now, the table is disabled, but the region is online on "fernet1-v49.ng.seznam.cz"!! Is there some race condition? Regards Martin Fiala