[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768293#comment-13768293
 ] 

Feng Honghua commented on HBASE-9480:
-------------------------------------

bq.(Cc'ing Feng Honghua since he expressed interest in this area too as is 
Jimmy Xiang of course)
Yes. Seems current master/zk/RS's main communication pattern (RS updates zk 
node, master watches change of zk node), together with the asynchronous and 
'one-time' nature of zk watch, result in too many corner cases for assignment 
manager(and region split). I'm making a proposal for new master/zk/RS's 
communcation pattern. The main theme is master sends request to RS, RS 
responses the progress back to master, master persists the request progress in 
another system table(like meta table), why not zk is for better 
throughput/performance for huge table with big number of regions... [~stack] / 
[~jxiang]
                
> Regions are unexpectedly made offline in certain failure conditions
> -------------------------------------------------------------------
>
>                 Key: HBASE-9480
>                 URL: https://issues.apache.org/jira/browse/HBASE-9480
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Jimmy Xiang
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
> trunk-9480_v1.2.patch, trunk-9480_v2.patch
>
>
> Came across this issue (HBASE-9338 test):
> 1. Client issues a request to move a region from ServerA to ServerB
> 2. ServerA is compacting that region and doesn't close region immediately. In 
> fact, it takes a while to complete the request.
> 3. The master in the meantime, sends another close request.
> 4. ServerA sends it a NotServingRegionException
> 5. Master handles the exception, deletes the znode, and invokes regionOffline 
> for the said region.
> 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
> deleted.
> The region is permanently offline.
> There are potentially other situations where when a RegionServer is offline 
> and the client asks for a region move off from that server, the master makes 
> the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to