[ 
https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4306:
-------------------------

    Fix Version/s: 0.92.0

> Race between CatalogJanitor and LoadBalancer
> --------------------------------------------
>
>                 Key: HBASE-4306
>                 URL: https://issues.apache.org/jira/browse/HBASE-4306
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0, 0.90.5
>
>
> It is possible for the LoadBalancer to try to assign an offline/split region 
> while it is waiting to be CatalogJanitor'ed. It goes like this:
> {quote}
> 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: parent: Daughters; d1, d2 from 
> sv4r22s16,60020,1314211225331
> ...
> (cleaning never happens or whatever)
> ...
> 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance 
> hri=parent, src=sv4r22s16,60020,1314211225331, 
> dest=sv4r19s17,60020,1314218170402
> 2011-08-29 13:45:14,561 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region parent (offlining)
> 2011-08-29 13:45:14,588 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server 
> serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, 
> usedHeap=0, maxHeap=0) returned 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: Received close for parent 
> but we are not serving it for parent
> {quote}
> Here it took 4 days of balancing to finally get to try to balance the parent 
> (that was never deleted because of HBASE-4238), but it can also happen if the 
> balancer decides to balance the parent just before it's cleaned. The end 
> effect is that the balancer will be disabled _forever_ until that's fixed.
> The culprit here is that the master keeps the region "online" until 
> AssignmentManager.regionOffline is called by the CJ, which means it's still 
> treated like any other region although it's offline.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to