Denis Chudov created IGNITE-25394: ------------------------------------- Summary: Log flooding on remaining nodes when a node is stopped Key: IGNITE-25394 URL: https://issues.apache.org/jira/browse/IGNITE-25394 Project: Ignite Issue Type: Bug Reporter: Denis Chudov
To reproduce: # Start a 3 node cluster (in Docker) and initialize # Stop a node ## Leader (node1) ## Follower (node2 or node3) For 2A: Followers logs (node3 and node2) will be flooded with {code:java} 2025-02-24 22:40:32 2025-02-25 03:40:32:451 +0000 [ERROR][%node2%JRaft-StepDownTimer-14][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower. 2025-02-24 22:40:33 2025-02-25 03:40:33:052 +0000 [ERROR][%node2%JRaft-StepDownTimer-10][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower. 2025-02-24 22:40:33 2025-02-25 03:40:33:652 +0000 [ERROR][%node2%JRaft-StepDownTimer-9][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower. 2025-02-24 22:40:34 2025-02-25 03:40:34:253 +0000 [ERROR][%node2%JRaft-StepDownTimer-2][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower. 2025-02-24 22:40:34 2025-02-25 03:40:34:854 +0000 [ERROR][%node2%JRaft-StepDownTimer-16][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower. 2025-02-24 22:40:35 2025-02-25 03:40:35:454 +0000 [ERROR][%node2%JRaft-StepDownTimer-4][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower. 2025-02-24 22:40:36 2025-02-25 03:40:36:055 +0000 [ERROR][%node2%JRaft-StepDownTimer-17][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower. 2025-02-24 22:40:36 2025-02-25 03:40:36:656 +0000 [ERROR][%node2%JRaft-StepDownTimer-6][ReplicatorGroupImpl] Fail to check replicator connection to peer=node1, replicatorType=Follower.{code} For 2B (any follower node is stopped) Leader logs (node1) will be flooded with {code:java} 2025-02-24 22:35:50 2025-02-25 03:35:50:856 +0000 [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC to node2, consecutiveErrorTimes=1350, error=Status[EINTERNAL<1004>: Check connection[node2] fail and try to create new one] 2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC to node2, consecutiveErrorTimes=1360, error=Status[EINTERNAL<1004>: Check connection[node2] fail and try to create new one] 2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue RPC to node2, consecutiveErrorTimes=1570, error=Status[EINTERNAL<1004>: Check connection[node2] fail and try to create new one] 2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC to node2, consecutiveErrorTimes=1370, error=Status[EINTERNAL<1004>: Check connection[node2] fail and try to create new one] 2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue RPC to node2, consecutiveErrorTimes=1580, error=Status[EINTERNAL<1004>: Check connection[node2] fail and try to create new one] 2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 [WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue RPC to node2, consecutiveErrorTimes=1590, error=Status[EINTERNAL<1004>: Check connection[node2] fail and try to create new one] 2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 [WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC to node2, consecutiveErrorTimes=1380, error=Status[EINTERNAL<1004>: Check connection[node2] fail and try to create new one] {code} These errors needs to be throttled somehow as it pollutes the logs and will make it more challenging to gather and analyze logs during node stoppage incidents. -- This message was sent by Atlassian Jira (v8.20.10#820010)