[jira] [Updated] (IGNITE-24342) [Flaky] Cannot reliably start 3-nodes cluster on a single Windows machine

Vyacheslav Koptilin (Jira) Tue, 11 Mar 2025 05:11:08 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-24342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vyacheslav Koptilin updated IGNITE-24342:
-----------------------------------------
    Labels: ignite-3  (was: )

> [Flaky] Cannot reliably start 3-nodes cluster on a single Windows machine
> -------------------------------------------------------------------------
>
>                 Key: IGNITE-24342
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24342
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 3.0
>         Environment: A single Windows 10 machine with 32 Gb of RAM
>            Reporter: Andrey Khitrin
>            Priority: Major
>              Labels: ignite-3
>         Attachments: logs.tgz
>
>
> This issue doesn't have a 100% reproducibility rate, but is frequent enough 
> to observe.
> How to reproduce:
>  # Try to start 3 AI nodes with a static `nodeFinder` on a single machine 
> (configs are attached)
> {code:java}
>         nodeFinder {
>             netClusterNodes=[
>                 "127.0.0.1:3344",
>                 "127.0.0.1:3345",
>                 "127.0.0.1:3346"
>             ]
>             type=STATIC
>         }
> {code}
> Expected result: all nodes are up.
> Actual result: 2 of 3 nodes terminated with thread dumps, cannot initialize 
> cluster.
> Key exceptions in logs:
>  # "IllegalStateException: cannot send more responses than requests" (see 
> attachment)
>  # Various RAFT-related and timeout errors:
> {code:java}
> 2025-01-28 06:03:05:471 -0600 
> [ERROR][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Response-Processor-8][AbstractClientService]
>  Fail to connect TablesAmountCapacityMultiNodeTest_cluster_0, exception: 
> java.util.concurrent.TimeoutException.
> 2025-01-28 06:03:05:815 -0600 
> [INFO][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Request-Processor-24][NodeImpl]
>  Node <cmg_group/TablesAmountCapacityMultiNodeTest_cluster_1> ignore 
> PreVoteRequest from TablesAmountCapacityMultiNodeTest_cluster_0, term=2, 
> currTerm=1, because the leader TablesAmountCapacityMultiNodeTest_cluster_1's 
> lease is still valid.
> 2025-01-28 06:03:05:815 -0600 
> [ERROR][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Response-Processor-8][ReplicatorGroupImpl]
>  Fail to check replicator connection to 
> peer=TablesAmountCapacityMultiNodeTest_cluster_0, replicatorType=Follower.
> 2025-01-28 06:03:05:836 -0600 
> [ERROR][%TablesAmountCapacityMultiNodeTest_cluster_1%JRaft-Response-Processor-8][NodeImpl]
>  Fail to add a replicator, peer=TablesAmountCapacityMultiNodeTest_cluster_0.
> {code}
>  # Thread dumps in logs for 2 of 3 nodes (see attachment)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-24342) [Flaky] Cannot reliably start 3-nodes cluster on a single Windows machine

Reply via email to