Race between CatalogJanitor and LoadBalancer
--------------------------------------------
Key: HBASE-4306
URL: https://issues.apache.org/jira/browse/HBASE-4306
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Blocker
Fix For: 0.90.5
It is possible for the LoadBalancer to try to assign an offline/split region
while it is waiting to be CatalogJanitor'ed. It goes like this:
{quote}
2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT: parent: Daughters; d1, d2 from
sv4r22s16,60020,1314211225331
...
(cleaning never happens or whatever)
...
2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance
hri=parent, src=sv4r22s16,60020,1314211225331,
dest=sv4r19s17,60020,1314218170402
2011-08-29 13:45:14,561 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Starting unassignment of region parent (offlining)
2011-08-29 13:45:14,588 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Server serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0) returned
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Received close for parent
but we are not serving it for parent
{quote}
Here it took 4 days of balancing to finally get to try to balance the parent
(that was never deleted because of HBASE-4238), but it can also happen if the
balancer decides to balance the parent just before it's cleaned. The end effect
is that the balancer will be disabled _forever_ until that's fixed.
The culprit here is that the master keeps the region "online" until
AssignmentManager.regionOffline is called by the CJ, which means it's still
treated like any other region although it's offline.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira