[ https://issues.apache.org/jira/browse/IGNITE-25363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Orlov reassigned IGNITE-25363: ----------------------------------------- Assignee: Konstantin Orlov > Sql. Delayed NODE_LEFT event processing may cause query to hung > --------------------------------------------------------------- > > Key: IGNITE-25363 > URL: https://issues.apache.org/jira/browse/IGNITE-25363 > Project: Ignite > Issue Type: Bug > Components: sql ai3 > Reporter: Konstantin Orlov > Assignee: Konstantin Orlov > Priority: Major > Labels: ignite-3 > > This problem is highlighted by test > {{org.apache.ignite.internal.runner.app.ItDataSchemaSyncTest#checkSchemasCorrectlyRestore}} > which sometimes fails on TC with timeout. The sequence of events as follow: > # Given: cluster of 3 nodes, distribution zone spans all these nodes. > # Node 1 has been restarted. > # Notification of > {{org.apache.ignite.internal.network.TopologyEventHandler#onDisappeared}} > handlers are delayed on node 2 (due to metastorage lagging or whatever > reason). > # Query started from node 1. > # Root fragment processed locally, {{QueryBatchRequest}} came to node 2 > before {{QueryStartRequest}}. This step is crucial since it puts not > completed future to mailbox registry > ({{org.apache.ignite.internal.sql.engine.exec.MailboxRegistryImpl#locals}}). > # {{TopologyEventHandler}}'s are notified on node 2. This step causes > {{onNodeLeft}} handler to be chained to the future from previous step. > # {{QueryStartRequest}} came to node 2. Query fragment is created an > immediately closed by {{onNodeLeft}} handler. > The problem is that {{onNodeLeft}} handler is applied to a query started on a > topology which takes into account node restart. We have to ignore such > outdated events. -- This message was sent by Atlassian Jira (v8.20.10#820010)