[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765703#comment-13765703
 ] 

Jimmy Xiang commented on HBASE-9480:
------------------------------------

I am looking the flaky test, will fix it soon.

bq.  I think you can safely revert the code in HRegionServer because the newly 
added following code resumes region transition after zk node deletion
I really worried about double-assignment. That's why I want to make sure the 
region is closed before we start to assign it somewhere else. I think it is a 
right thing to differentiate not serving from still closing. We can fix new 
issues caused by this, right?

bq.  I'm wondering if it's possible that you can move the following code inside 
unsign itself immediately after
I thought about this too.  The reason I didn't do that is because sometimes we 
don't want to re-assign the region now. For example, inside handleRegion when 
an unexpected RS_ZK_REGION_OPENED is received.
                
> Regions are unexpectedly made offline in certain failure conditions
> -------------------------------------------------------------------
>
>                 Key: HBASE-9480
>                 URL: https://issues.apache.org/jira/browse/HBASE-9480
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Jimmy Xiang
>            Priority: Blocker
>             Fix For: 0.96.0
>
>         Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to