[
https://issues.apache.org/jira/browse/HBASE-21863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764234#comment-16764234
]
stack commented on HBASE-21863:
-------------------------------
I took a look at the patch. On second part where you have rpc deadline, yeah,
we've avoided going this route up to now as it compounds the possible states
we'd have to deal with upping the possibility of double assign, a condition
rare in amv2 (though you seem to have found a case but looks like there was a
bug fixed over in HBASE-21862).
On this bit of the patch:
396 LOG.warn("Received report {} transition from {} for {}, pid={}
but the region is not on it,"
397 + " killing RS", TransitionCode.OPENED, serverName, regionNode,
getProcId());
398 // We may be killing an innocent RS due to some network race
condition (to fix that, we'd
399 // need HBASE-21864). However, that is relatively harmless
compared to HBASE-21862.
400 // Play it safe and assume we could have a double-assignment
situation.
401 // Note that we don't do it in regular RS report, because races
there are much more frequent.
402 throw new YouAreDeadException("Potentially double-assigning " +
regionNode);
...... I think a version of this makes sense. Not sure if TRSP is the place for
it as we may get the message though no waiting TRSP. I'd throw something other
than a YADE, perhaps a more specific subclass, since YARDE has up to this had
one usage. I'd also wait till we had an instance of a report from a RS that had
an unaccounted Region opening... I'd like to know how it comes about first
before building the handling.
Thanks.
> narrow down the double-assignment race window
> ---------------------------------------------
>
> Key: HBASE-21863
> URL: https://issues.apache.org/jira/browse/HBASE-21863
> Project: HBase
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Major
> Attachments: HBASE-21863.patch
>
>
> See HBASE-21862.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)