Oh yeah I see. So the issue is that if a region was closed and disabled when the first master was running, it won't be assigned anywhere and won't be in transition either (it's called being in RIT in the code). When the new master comes around, and disable is called, it does a check to see if the region is in RIT but not if it was already disabled, and fails on NPE because it's not assigned to anyone.
Calling enable before disable should get you out of the situation? Would you mind opening a jira? Thx! J-D 2011/3/28 Gaojinchao <gaojinc...@huawei.com>: > Hbase version is 0.90.1 > > > private Map<HServerInfo,List<Pair<HRegionInfo,Result>>> rebuildUserRegions() > throws IOException { > // Region assignment from META > List<Result> results = MetaReader.fullScanOfResults(catalogTracker); > // Map of offline servers and their regions to be returned > Map<HServerInfo,List<Pair<HRegionInfo,Result>>> offlineServers = > new TreeMap<HServerInfo,List<Pair<HRegionInfo,Result>>>(); > // Iterate regions in META > for (Result result : results) { > Pair<HRegionInfo,HServerInfo> region = > MetaReader.metaRowToRegionPairWithInfo(result); > if (region == null) continue; > HServerInfo regionLocation = region.getSecond(); > HRegionInfo regionInfo = region.getFirst(); > if (regionLocation == null) { > // Region not being served, add to region map with no assignment > // If this needs to be assigned out, it will also be in ZK as RIT > this.regions.put(regionInfo, null); > ---- It seems like some bug in special scenario when hamster restart or > failover > } else if (!serverManager.isServerOnline( > > -----邮件原件----- > 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans > 发送时间: 2011年3月29日 1:02 > 收件人: user@hbase.apache.org > 主题: Re: Hmaster had crashed as disabling table > > Which HBase version is this? > > Thx, > > J-D > > 2011/3/28 Gaojinchao <gaojinc...@huawei.com>: >> when master restart or Failover, it refresh user regions. >> It seems having some bug. >> >> >> if (regionCount == 0) { >> LOG.info("Master startup proceeding: cluster startup"); >> this.assignmentManager.cleanoutUnassigned(); >> this.assignmentManager.assignAllUserRegions(); >> } else { >> LOG.info("Master startup proceeding: master failover"); >> this.assignmentManager.processFailover(); -- when master >> restart or Failover, it will refresh user regions. >> } >> >> >> -----邮件原件----- >> 发件人: Gaojinchao [mailto:gaojinc...@huawei.com] >> 发送时间: 2011年3月28日 11:41 >> 收件人: user@hbase.apache.org >> 主题: Hmaster had crashed as disabling table >> >> Operation step: >> 1, startup cluster with HA master >> 2, the active master crashed while it is creating table with region >> 3, backup master become active. >> 4, I want to drop the table >> 5, the active master crashed >> >> I can't drop the table whatever I do ? >> >> The log as: >> >> >> 2011-03-28 10:51:58,347 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to >> disable table ufdr >> 2011-03-28 10:51:58,374 INFO >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 470 >> regions. >> 2011-03-28 10:51:58,377 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,,1301128408707.a9d08c22b8a7b0f902ccffce424252fd. (offlining) >> 2011-03-28 10:51:58,378 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613810384615,1301128408710.ba1a5fef02bd67b5630802fb2c5707a6. >> (offlining) >> 2011-03-28 10:51:58,379 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613810769230,1301128408710.12d027d3c1934f3fd76ef48915461569. >> (offlining) >> 2011-03-28 10:51:58,379 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613811153845,1301128408710.9700c58da2d0d1c9306b1d1ff832be1d. >> (offlining) >> 2011-03-28 10:51:58,384 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613811538460,1301128408710.862232de569b0c8efdac7ea350f30974. >> (offlining) >> 2011-03-28 10:51:58,385 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613811923075,1301128408711.54772ed69d8315e3a562d5e98bf61955. >> (offlining) >> 2011-03-28 10:51:58,385 FATAL org.apache.hadoop.hbase.master.HMaster: Remote >> unexpected exception >> java.lang.NullPointerException: Passed server is null >> at >> org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:581) >> at >> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1093) >> at >> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1040) >> at >> org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler$1.run(DisableTableHandler.java:132) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> 2011-03-28 10:51:58,386 INFO org.apache.hadoop.hbase.master.HMaster: Aborting >> 2011-03-28 10:51:58,386 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613814230765,1301128408711.2455e205497987dd83f40869c2bf0615. >> (offlining) >> 2011-03-28 10:51:58,386 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613813846150,1301128408711.5ea78e17593d2d0d8260fc1b2f58bf7c. >> (offlining) >> 2011-03-28 10:51:58,386 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613813461535,1301128408711.dae370c25610aca41ea060db7333f519. >> (offlining) >> 2011-03-28 10:51:58,386 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613813076920,1301128408711.a608be1914b40d8aa08d9ffb649826d3. >> (offlining) >> 2011-03-28 10:51:58,387 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613814999995,1301128408711.82f646d34013ea99b244e9e1837c4e04. >> (offlining) >> 2011-03-28 10:51:58,386 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613812307690,1301128408711.535d47842e7ad7b35be6e98e5f46b407. >> (offlining) >> 2011-03-28 10:51:58,385 DEBUG >> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of >> region ufdr,0008613812692305,1301128408711.4efc83d4a99eff9d4df7ce0154ef4c58. >> (offlining) >> 2011-03-28 10:51:58,385 FATAL org.apache.hadoop.hbase.master.HMaster: Remote >> unexpected exception >> java.lang.NullPointerException: Passed server is null >> >