[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116721#comment-13116721
]
stack commented on HBASE-4497:
------------------------------
bq. startcode and timestamp is what i initially thought of. seems like there
could be some weird situations. for example, what is to say that the server
already in META didn't somehow become the new assignment destination?
The timestamp will be different in this case? (It'll have been updated by the
new open).
bq. or what if... M tells RS1 to OPEN R1 and to expect RS3:StartCode3....
I'm not suggesting the master tell the RS anything new. I'm suggesting that on
receiving the open, the RS itself read .META. at start of the open transaction
before it does anything else and use this read as input for the later
checkAndSet write.
bq. one neat idea would be to introduce this region assignment incrementing ID
into META. it would provide a nice way to debug the movement of a region across
the cluster over time and could also provide the necessary info to use
CheckAndPut.
This could work. Downsides are M has to write meta first before doing assign
which will be a bit of new burden on meta (double'd write load?) and this new
write is now inline with an assign; we'd have to do some hackery in here around
bulk assign.
> If region opening fails after updating META HBCK reports it as inconsistent
> and scanning the region throws NSRE
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-4497
> URL: https://issues.apache.org/jira/browse/HBASE-4497
> Project: HBase
> Issue Type: Bug
> Reporter: ramkrishna.s.vasudevan
> Priority: Critical
>
> As per the discussion in the mail chain "HBCK reporting of possible mismatch
> in RS assignment" this JIRA is created.
> Consider two RS-> RS1 and RS2.
> A region tries to open in RS1. But it takes a while. The RS1 has still not
> updated meta and transitioned the node from OPENING to OPENED
> So timeout assigns the region to RS2. RS2 successfully updates the META and
> opens the region.
> Now RS1 tries to act on the region by first updating the META and then
> transiting the node to OPENING to OPENED.
> RS1 transiting the node to OPENING to OPENED will fail. But the META entry
> will have RS1 as the latest.
> Now HBCK reports this as an inconsistency and if we try to scan the Region we
> get NotServingRegionException.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira