[ 
https://issues.apache.org/jira/browse/HBASE-21864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764592#comment-16764592
 ] 

stack commented on HBASE-21864:
-------------------------------

[~Apache9] set me right again (brain-fart)

Versioning the RPCs messages sounds like a nice addition to help w/ the race 
described here. Was wondering what to do when server restarts. Each server does 
have a 'startcode' that increments across restarts. Perhaps this could be the 
MSBs in the RPC sequence number?

Another thought for avoiding the race was having one channel of communication 
only rather than the few we have now. We could add the result of an RPC to the 
heartbeat message perhaps on the end of the server report. Region open/close 
used to work this way --  before being zk'd -- reporting open/close/error on 
the tail of the region server report preempting the heartbeat interval if a 
message to send.

I think versioning all messages out of a server a better way to go. I see it 
being of user in scenarios beyond this one.

Thanks.

> add region state version and reinstate YouAreDead exception in region report
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-21864
>                 URL: https://issues.apache.org/jira/browse/HBASE-21864
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> The state version will ensure we don't have network-related races  (e.g. the 
> one I reported in some other bug -
> {code}
> RS: send report {R1} ...
> M: close R1
> RS: I closed R1
> M ... receive report {R1}
> M: you shouldn't have R1, die
> {code}).
> Then we can revert the change that removed YouAreDead exception... RS in 
> incorrect state should be either brought into correct state or killed because 
> it means there's some bug; right now if double assignment happens (I found 2 
> different cases just this week ;)) master lets RS with incorrect assignment 
> keep it forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to