Anton Vinogradov created IGNITE-28855:
-----------------------------------------
Summary: Client node permanently loses discovery events delivered
right after the reconnect response
Key: IGNITE-28855
URL: https://issues.apache.org/jira/browse/IGNITE-28855
Project: Ignite
Issue Type: Task
Reporter: Anton Vinogradov
Assignee: Sergey Chugunov
Regression of IGNITE-26111 (TcpDiscoverySpi uses MessageSerializer).
Before IGNITE-26111, ClientImpl.SocketStream owned a single BufferedInputStream
shared by every reader of the discovery socket. After the refactoring,
Reconnector and SocketReader each create their own TcpDiscoveryIoSession, each
wrapping the raw socket into a new BufferedInputStream.
When the router server writes the reconnect response and a subsequent discovery
message back-to-back (a single TCP segment -- typical during a topology storm,
when a NodeLeftMessage is enqueued to the client's message worker right behind
the reconnect response), the Reconnector's session reads ahead and buffers the
bytes of the following message. The Reconnector stops reading at the reconnect
response, the socket is handed over to SocketReader, which attaches a fresh
session to the raw socket -- the buffered bytes are silently discarded and the
message is lost forever.
Consequences on the client:
* a discovery event (e.g. NODE_LEFT) is permanently lost;
* if any later topology message arrives, the client fails with a critical error:
java.lang.AssertionError: lastVer=7, newVer=9 ... at
ClientImpl.updateTopologyHistory(ClientImpl.java:932)
(seen in IgniteCacheGroupsPartitionLossPolicySelfTest:
https://ci2.ignite.apache.org/viewLog.html?buildId=9169034);
* if the lost message was the last one, the client silently stays on a stale
topology (DiscoCache / PME desync).
Fix: create TcpDiscoveryIoSession once per socket (in sendJoinRequest, the same
session that has read the handshake response) and carry it inside SocketStream;
Reconnector and SocketReader reuse it instead of creating their own. This
restores the pre-IGNITE-26111 invariant: a single buffered reader per socket.
Reproducer: TcpDiscoveryClientTopologyGapTest (5 servers + 4 clients,
sequential graceful stop of 3 servers including the clients' router and the
coordinator). Without the fix a client gets stuck one topology version short
within ~150 iterations; with the fix 3x300 iterations pass clean.
TcpClientDiscoverySpiSelfTest (52 tests),
TcpDiscoveryPendingMessageDeliveryTest and
TcpClientDiscoverySpiCoordinatorChangeTest pass as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)