[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790472#comment-16790472
 ] 

Amelchev Nikita commented on IGNITE-11460:
------------------------------------------

[~amashenkov], In such case we can get a situation when the client already 
joined to topology (will receive mvcc requests) but the current coordinator is 
in a disconnected state until processed discovery event.
And possible another situation when the client already disconnected but the 
coordinator wasn't disconnected until processed client disconnect discovery 
event. 
Also, it seems there is no event for local node join in the discovery event 
thread.

With my changes coordinator will be set correctly on local join/on disconnected 
and will not process unnecessary events that happen(will be processed) after 
the client leave topology. When the client join topology again it will be set a 
correct coordinator and events will be processed only after get a client 
reconnect event.

> MVCC: Possible race on coordinator changing on client reconnection.
> -------------------------------------------------------------------
>
>                 Key: IGNITE-11460
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11460
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Amelchev Nikita
>            Assignee: Amelchev Nikita
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.8
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
>     at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
>     at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
>     at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>     at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to