[
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789730#comment-16789730
]
Amelchev Nikita commented on IGNITE-11460:
------------------------------------------
[~amashenkov], Thanks for taking a look. Guarantees of discovery events work
fine.
Problem is that client disconnection processing happens not on the event. It's
processed in the {{onDisconnected}} method (disco-notifier-thread).
I attached reproducer in PR. Steps to reproduce:
1. Do something that generates an event to change coordinator. For example,
start a node. Add a custom listener to hold up this event. Node's {{notifier
thread}} will process messages and put the {{node_joined}} event into the
events queue. The custom listener will hold up the event processing.
2. Restart cluster. Client notifier thread will process this and change
coordinator onto {{disconneted-coordinator}}. After this, it will put
{{client-disconnected-event}} into events queue.
3. The client will find server and process local join in notifier thread and
set new coordinator.
4. The event of {{node_joined}} from p.1 will be processed by
disco-event-thread and coordinator can be changed onto wrong.
After this client gets {{client disconnect event}} and {{client reconnect}}
event.
Coordinator changing is out of sync by it's processed not only in event-thread
with guarantees. It's can be changed by disco-notifier-thread by internal
methods ({{onDisconnected}}, {{onLocalJoin}}). There are no guarantees between
these two threads.
> MVCC: Possible race on coordinator changing on client reconnection.
> -------------------------------------------------------------------
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
> Issue Type: Bug
> Reporter: Amelchev Nikita
> Assignee: Amelchev Nikita
> Priority: Major
> Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event
> thread when the client already disconnect (disconnection processed in
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set
> in notifier thread.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)