[
https://issues.apache.org/jira/browse/IGNITE-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044767#comment-17044767
]
Andrey N. Gura commented on IGNITE-12714:
-----------------------------------------
[~akalashnikov] Timeout increasing will not fix problem. I believe the problem
in data streamer threads. They should update progress timestamps.
> Absence of default value of IGNITE_SYSTEM_WORKER_BLOCKED TIMEOUT
> ----------------------------------------------------------------
>
> Key: IGNITE-12714
> URL: https://issues.apache.org/jira/browse/IGNITE-12714
> Project: Ignite
> Issue Type: Bug
> Reporter: Anton Kalashnikov
> Assignee: Anton Kalashnikov
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Scenario:
> 1. Start 3 data nodes
> 2. Start load with a streamer on 6 clients
> 3. Start data nodes restarter
> Result:
> Keys weren't loaded in all (1000) caches.
> In the server node log I see:
> {noformat}
> [2019-07-17 16:52:36,881][ERROR][tcp-disco-msg-worker-#2] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=data-streamer-stripe-7, blockedFor=16s]
> [2019-07-17 16:52:36,883][WARN ][tcp-disco-msg-worker-#2] Thread
> [name="data-streamer-stripe-7-#24", id=43, state=WAITING, blockCnt=111,
> waitCnt=169964]
> [2019-07-17 16:52:36,885][ERROR][tcp-disco-msg-worker-#2] Critical system
> error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-7,
> igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]]]
> org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-7,
> igniteInstanceName=null, finished=false, heartbeatTs=1563371540069]
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1838)
> ~[ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1833)
> ~[ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:230)
> ~[ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
> ~[ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2804)
> ~[ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7568)
> [ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2866)
> [ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [ignite-core-2.5.9.jar:2.5.9]
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7506)
> [ignite-core-2.5.9.jar:2.5.9]
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [ignite-core-2.5.9.jar:2.5.9]
> {noformat}
> *Solution:*
> Increase timeout to 2 min
> org.apache.ignite.IgniteSystemProperties#IGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT
--
This message was sent by Atlassian Jira
(v8.3.4#803005)