Denis Chudov created IGNITE-25394:
-------------------------------------

             Summary: Log flooding on remaining nodes when a node is stopped
                 Key: IGNITE-25394
                 URL: https://issues.apache.org/jira/browse/IGNITE-25394
             Project: Ignite
          Issue Type: Bug
            Reporter: Denis Chudov


To reproduce:
 # Start a 3 node cluster (in Docker) and initialize

 # Stop a node

 ## Leader (node1)

 ## Follower (node2 or node3)

For 2A:

Followers logs (node3 and node2) will be flooded with

 
{code:java}
2025-02-24 22:40:32 2025-02-25 03:40:32:451 +0000 
[ERROR][%node2%JRaft-StepDownTimer-14][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.
2025-02-24 22:40:33 2025-02-25 03:40:33:052 +0000 
[ERROR][%node2%JRaft-StepDownTimer-10][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.
2025-02-24 22:40:33 2025-02-25 03:40:33:652 +0000 
[ERROR][%node2%JRaft-StepDownTimer-9][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.
2025-02-24 22:40:34 2025-02-25 03:40:34:253 +0000 
[ERROR][%node2%JRaft-StepDownTimer-2][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.
2025-02-24 22:40:34 2025-02-25 03:40:34:854 +0000 
[ERROR][%node2%JRaft-StepDownTimer-16][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.
2025-02-24 22:40:35 2025-02-25 03:40:35:454 +0000 
[ERROR][%node2%JRaft-StepDownTimer-4][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.
2025-02-24 22:40:36 2025-02-25 03:40:36:055 +0000 
[ERROR][%node2%JRaft-StepDownTimer-17][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.
2025-02-24 22:40:36 2025-02-25 03:40:36:656 +0000 
[ERROR][%node2%JRaft-StepDownTimer-6][ReplicatorGroupImpl] Fail to check 
replicator connection to peer=node1, replicatorType=Follower.{code}
For 2B (any follower node is stopped)

Leader logs (node1) will be flooded with
{code:java}
2025-02-24 22:35:50 2025-02-25 03:35:50:856 +0000 
[WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC 
to node2, consecutiveErrorTimes=1350, error=Status[EINTERNAL<1004>: Check 
connection[node2] fail and try to create new one]
2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 
[WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC 
to node2, consecutiveErrorTimes=1360, error=Status[EINTERNAL<1004>: Check 
connection[node2] fail and try to create new one]
2025-02-24 22:35:51 2025-02-25 03:35:51:460 +0000 
[WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue RPC 
to node2, consecutiveErrorTimes=1570, error=Status[EINTERNAL<1004>: Check 
connection[node2] fail and try to create new one]
2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 
[WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC 
to node2, consecutiveErrorTimes=1370, error=Status[EINTERNAL<1004>: Check 
connection[node2] fail and try to create new one]
2025-02-24 22:35:52 2025-02-25 03:35:52:063 +0000 
[WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue RPC 
to node2, consecutiveErrorTimes=1580, error=Status[EINTERNAL<1004>: Check 
connection[node2] fail and try to create new one]
2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 
[WARNING][%node1%JRaft-AppendEntries-Processor-2][Replicator] Fail to issue RPC 
to node2, consecutiveErrorTimes=1590, error=Status[EINTERNAL<1004>: Check 
connection[node2] fail and try to create new one]
2025-02-24 22:35:52 2025-02-25 03:35:52:666 +0000 
[WARNING][%node1%JRaft-AppendEntries-Processor-0][Replicator] Fail to issue RPC 
to node2, consecutiveErrorTimes=1380, error=Status[EINTERNAL<1004>: Check 
connection[node2] fail and try to create new one] {code}
These errors needs to be throttled somehow as it pollutes the logs and will 
make it more challenging to gather and analyze logs during node stoppage 
incidents.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to