[
https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell resolved HBASE-12440.
------------------------------------
Resolution: Fixed
Fix Version/s: (was: 0.99.1)
0.99.2
Hadoop Flags: Reviewed
> Region may remain offline on clean startup under certain race condition
> -----------------------------------------------------------------------
>
> Key: HBASE-12440
> URL: https://issues.apache.org/jira/browse/HBASE-12440
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Reporter: Virag Kothari
> Assignee: Virag Kothari
> Fix For: 0.98.8, 0.99.2
>
> Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch,
> HBASE-12440-branch-1.patch
>
>
> Saw this in prod some time back with zk assignment
> On clean startup, while master was doing bulk assign while one of the region
> servers dies. The bulk assigner then tried to assign it individually using
> AssignCallable. The AssignCallable does a forceStateToOffline() and skips
> assigning as it wants the SSH to do the assignment
> {code}
> 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] :
> Offline
> sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8.,
> no need to unassign since it's on a dead server:
> gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
> 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition
> {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482,
> server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to
> {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593,
> server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016}
> 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip
> assigning
> sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8.,
> it is on a dead but not processed yet server:
> gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
> {code}
> But the SSH wont assign as the region is offline but not in transition
> {code}
> 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler
> [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that
> gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0
> regions(s) that were opening on this server)
> 2014-10-16 16:05:24,606 DEBUG master.DeadServer
> [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing
> gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
> {code}
> In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH
> may try to assign the region. But as they go through lock, only one will
> succeed and doesn't seem to be an issue.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)