[
https://issues.apache.org/jira/browse/HBASE-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shuaifeng Zhou updated HBASE-14407:
-----------------------------------
Attachment: hbase-14407-1.2.patch
hbase-14407-1.1.patch
hbase-14407-0.98.patch
A possible solution is when processAlreadyOpenedRegion, check zk state before
modify master memory.
Patch on branch 0.98, 1.1 and 1.2 is attached. And I tested 0.98.10 modified as
this with more than 10,000 regions, that's ok(before, the problem happens every
time restarting hbase).
In master branch, assign not using zk, so there is no problem.
Please review it, welcome more smart solution.
> NotServingRegion: hbase region closed forever
> ---------------------------------------------
>
> Key: HBASE-14407
> URL: https://issues.apache.org/jira/browse/HBASE-14407
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 0.98.10, 1.2.0, 1.1.2, 1.3.0
> Reporter: Shuaifeng Zhou
> Assignee: Shuaifeng Zhou
> Priority: Critical
> Attachments: hbase-14407-0.98.patch, hbase-14407-1.1.patch,
> hbase-14407-1.2.patch, hs4.log, master.log
>
>
> I found a situation may cause region closed forever, and this situation
> happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the
> problem:
> 1, master send region open to regionserver
> 2, rs open a handler do openregion
> 3, rs return resopnse to master
> 3, master not received the response, or timeout, send open region again
> 4, rs already opened the region
> 5, master processAlreadyOpenedRegion, update regionstate open in master
> memory
> 6, master received zk message region opened(for some reason late, eg: net
> work), and triger update regionstate open, but find that region already
> opened, ERROR!
> 7, master send close region, and region be closed forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)