[
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763197#comment-13763197
]
Jeffrey Zhong commented on HBASE-9480:
--------------------------------------
It's not ideal to abort here. Because the aborting is on master(Single Failure
Point) which handles region assignment & SSH and may have other chain effects
or master may keep aborting.
Since the issue is more caused by
{code}deleteClosingOrClosedNode(region);{code} which stopped the assignment
state machine, I think we can remove them(there are two places in this unassign
function).
The longer term fix should allow unassign to throw exception to let different
code paths handle differently and fast fail move region request(either by a
user or balancer) before a region move or during a move.
> Regions are unexpectedly made offline in certain failure conditions
> -------------------------------------------------------------------
>
> Key: HBASE-9480
> URL: https://issues.apache.org/jira/browse/HBASE-9480
> Project: HBase
> Issue Type: Bug
> Reporter: Devaraj Das
> Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 9480-1.txt
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline
> and the client asks for a region move off from that server, the master makes
> the region offline.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira