[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116151#comment-13116151
 ] 

Ming Ma commented on HBASE-4497:
--------------------------------

checkAndPut might work. We will use checkAndPut on both ZK as well as HBase. 
There are other bugs due to the lack of strong synchronization on the ZK nodes 
among AssignmentManager and RSs. Here is another scenario for race between AM 
timeoutMonitor and the first RS's openRegion operation.

RS1 successfully transition to OPENED state around the same time as 
timeoutMonitor kicks in, timeoutMonitor gets data from ZK right before RS1 set 
it to OPENED, thus timeoutMonitor has RS_ZK_REGION_OPENING and tries to 
reassign the region. In that case, we will end up with the same region on two 
RSs.


Will the followings work?

1. ZKAssign.transitionNode has some sort of checkAndPut semantics when it tries 
to enforce the original state is the correct one. However, it isn't atomic. It 
first tries to getData from ZK and then compare. Instead, we can use ZK's 
checkAndPut API to enforce the atomicity.
2. Introduce a ZK-base global AtomicInteger for region operation; e.g., each 
openRegion operation will use a new incremental region_operation_ID. Each 
openRegion operation will validate its own ID with ZK state via checkAndPut. 
Thus one of the two openRegion operations on RSs won't work.
3. With regard to HBase .META. update, we can put region_operation_ID into the 
table and enforce new update's region operation ID has to be greater than the 
previous version for a given region. In that way the older RS won't be able to 
update the table properly. We will need to introduce a new API for HBase, 
similar to checkAndPut, more like checkGreaterandPut.

                
> If region opening fails after updating META HBCK reports it as inconsistent 
> and scanning the region throws NSRE
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4497
>                 URL: https://issues.apache.org/jira/browse/HBASE-4497
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>
> As per the discussion in the mail chain "HBCK reporting of possible mismatch 
> in RS assignment" this JIRA is created.
> Consider two RS-> RS1 and RS2.
> A region tries to open in RS1. But it takes a while.  The RS1 has still not 
> updated meta and transitioned the node from OPENING to OPENED
> So timeout assigns the region to RS2.  RS2 successfully updates the META and 
> opens the region.
> Now RS1 tries to act on the region by first updating the META and then 
> transiting the node to OPENING to OPENED.
> RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
> will have RS1 as the latest.
> Now HBCK reports this as an inconsistency and if we try to scan the Region we 
> get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to