Igor created IGNITE-24722:
-----------------------------
Summary: [FLAKY][Windows] 1 node goes down when 3 nodes cluster is
started on 9 cores cpu
Key: IGNITE-24722
URL: https://issues.apache.org/jira/browse/IGNITE-24722
Project: Ignite
Issue Type: Bug
Components: general, platforms
Affects Versions: 3.1
Environment: 3 nodes on single Windows machine (cores=9, memory=32766)
Reporter: Igor
Attachments: cluster logs.zip
*Steps to reproduce:*
1. Start 3 nodes on single Windows machine (cores=9, memory=32766)
*Expected:*
3 nodes started and joined into cluster.
*Actual:*
1 node makes thread dump and shutting down.
The node has log messages like:
{code:java}
2025-03-05 22:19:32:184 -0600
[WARNING][%BasicAi3Operations3NodesTest_cluster_1%common-scheduler-0][FailureManager]
Possible failure suppressed according to a configured handler
[hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_BLOCKED]
org.apache.ignite.lang.IgniteException: IGN-WORKERS-1
TraceId:538a0c73-bc2e-481b-a5df-45ab414c3e15 A critical thread is blocked for
2978 ms that is more than the allowed 500 ms, it is
"%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0"
prio=10 Id=153 WAITING on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@608a31a6
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@608a31a6
at
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081)
at
[email protected]/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at [email protected]/java.lang.Thread.run(Thread.java:834) {code}
and
{code:java}
2025-03-05 22:19:32:535 -0600
[INFO][%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0][DistributionZoneManager]
Failed to update distribution zones' logical topology and version keys
[topology = [{id=71f7ef04-da2f-45d2-a1f1-b802e0542f67,
name=BasicAi3Operations3NodesTest_cluster_0, address=172.25.1.11:3344}],
version = 1]
2025-03-05 22:19:32:545 -0600
[INFO][%BasicAi3Operations3NodesTest_cluster_1%MessagingService-inbound-Default-0-0][DistributionZoneManager]
Failed to update distribution zones' logical topology and version keys
[topology = [{id=71f7ef04-da2f-45d2-a1f1-b802e0542f67,
name=BasicAi3Operations3NodesTest_cluster_0, address=172.25.1.11:3344},
{id=764f1058-8120-43e0-bdc1-e2e49ce31818,
name=BasicAi3Operations3NodesTest_cluster_2, address=172.25.1.11:3346}],
version = 2] {code}
Logs are in attachment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)