[jira] [Commented] (IGNITE-10047) MVCC: Wrong coordinator assignment when two oldest nodes fail.

Igor Seliverstov (JIRA) Sat, 10 Nov 2018 10:34:16 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682539#comment-16682539
 ]


Igor Seliverstov commented on IGNITE-10047:
-------------------------------------------

[~rkondakov], my comments:

1) Mvcc coordinator should not be used in tx prepare routines - it's old and 
dead code, all the changes have no sense there.

2) coordinator assignment and queries collecting are two independent operations 
and shouldn't be mixed

3) Coordinator active queries should not be a part of partition single messages 
in case exchange coordinator isn't mvcc coordinator as well.

4) Process previous coordinator failure (assign new coordinator) in 

CacheCoordinatorNodeFailListener instead of callback from exchange future.

5) Seems it's an old bug: CacheCoordinatorNodeFailListener shouldn't be invoked 
on non-coordinator nodes except the case from previous point.

> MVCC: Wrong coordinator assignment when two oldest nodes fail.
> --------------------------------------------------------------
>
>                 Key: IGNITE-10047
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10047
>             Project: Ignite
>          Issue Type: Bug
>          Components: mvcc
>            Reporter: Roman Kondakov
>            Assignee: Roman Kondakov
>            Priority: Major
>             Fix For: 2.8
>
>
> Reproducer: 
> {{CacheContinuousQueryFailoverMvccTxSelfTest#testLeftPrimaryAndBackupNodes}}. 
> This test can sporadically hangs when topology is unstable.
> The problem here is when two oldest nodes A and B failed, other nodes elect B 
> node as a new coordinator despite it is down. This happens because the new 
> mvcc coordinator is assigned in  {{GridDhtPartitionsExchangeFuture#init}} 
> method which is called only ones in case of multiple nodes fail 
> simultaneously. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-10047) MVCC: Wrong coordinator assignment when two oldest nodes fail.

Reply via email to