Alexander Lapin created IGNITE-13374:
----------------------------------------
Summary: Initial PME hangs because of multiple blinking nodes
Key: IGNITE-13374
URL: https://issues.apache.org/jira/browse/IGNITE-13374
Project: Ignite
Issue Type: Bug
Reporter: Alexander Lapin
Assignee: Alexander Lapin
Fix For: 2.10
*Root cause* of the issue is a race inside GridDhtPartitionsExchangeFuture on
client side between two processes:
# When old coordinator fails and the new one takes over it sends
GridDhtPartitionsSingleRequest messages to all nodes including clients to
restore exchange results. Processing this message on client includes updating
current coordinator reference (crd field).
# When future receives discovery notification about old coordinator failure it
should detect change of coordinator and send GridDhtPartitionsSingleMessage to
new coordinator to obtain affinity. But updated crd field prevents client from
detecting coordinator failure and sending SingleMessage to new coordinator
which in turn leads to hanging client.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)