Virag Kothari created HBASE-12440:
-------------------------------------

             Summary: Region may remain offline on clean startup under certain 
race condition
                 Key: HBASE-12440
                 URL: https://issues.apache.org/jira/browse/HBASE-12440
             Project: HBase
          Issue Type: Bug
            Reporter: Virag Kothari
            Assignee: Virag Kothari
             Fix For: 0.98.8, 0.99.1


Saw this in prod some time back with zk assignment
On clean startup, while master was doing bulk assign while one of the region 
servers dies. The bulk assigner then tried to assign it individually using 
AssignCallable. The AssignCallable does a forceStateToOffline() and skips 
assigning as it wants the SSH to do the assignment
{code}
2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline 
sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8.,
 no need to unassign since it's on a dead server: 
gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
2014-10-16 16:05:23,593  INFO master.RegionStates [AM.-pool1-t1] : Transition 
{1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, 
server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to 
{1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, 
server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016}
2014-10-16 16:05:23,598  INFO master.AssignmentManager [AM.-pool1-t1] : Skip 
assigning 
sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8.,
 it is on a dead but not processed yet server: 
gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
{code}
But the SSH wont assign as the region is offline but not in transition
{code}
2014-10-16 16:05:24,606  INFO handler.ServerShutdownHandler 
[MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that 
gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 
regions(s) that were opening on this server)
2014-10-16 16:05:24,606 DEBUG master.DeadServer 
[MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing 
gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
{code}

In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH 
may try to assign the region. But as they go through lock, only one will 
succeed and doesn't seem to be an issue. 


 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to