[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

Andrew Mashenkov (JIRA) Fri, 15 Mar 2019 04:35:50 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793557#comment-16793557
 ]


Andrew Mashenkov commented on IGNITE-11460:
-------------------------------------------

[~NSAmelchev],

I'd think there is a bug in Discovery and onLocalJoin semantic is broken on 
client.
Discovery events should be ordered and we should get events from old topology 
after 're-connect' event, but no events (node_left\failed) shouldn't be ignored.

So, correct fix is to wait somehow for all discovery events from event storage 
being processed before handling onLocalJoin.
Other possible way is to rework 'event' processing to be run in single thread 
with preserving event order.

> MVCC: Possible race on coordinator changing on client reconnection.
> -------------------------------------------------------------------
>
>                 Key: IGNITE-11460
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11460
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Amelchev Nikita
>            Assignee: Amelchev Nikita
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.8
>
>         Attachments: stacktraces.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
>     at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
>     at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
>     at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>     at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

Reply via email to