[ 
https://issues.apache.org/jira/browse/IGNITE-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Daschinskiy updated IGNITE-13540:
--------------------------------------
    Fix Version/s: 2.10

> Exchange worker, waiting for new task from queue, considered as blocked.
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-13540
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13540
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Daschinskiy
>            Assignee: Ivan Daschinskiy
>            Priority: Major
>             Fix For: 2.10
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Waiting for new task in ExchangeWorker#body now is not marking as blocking 
> section.
> So if network timeout (timeout for polling task from queue) is greater than 
> system worker blocked timeout, exchange worker thread is considered as 
> blocking. Sometimes this is reported in logs after few seconds when actually 
> PME has been finished
> {noformat}
> [2020-10-06 16:55:45,939][INFO 
> ][exchange-worker-#50][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager1]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=6, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, 
> node=163fd0f0-b9a4-4317-a28f-f7dbdb776076]
> [2020-10-06 16:55:48,822][ERROR][tcp-disco-msg-worker-[9e18957a 
> 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [workerName=partition-exchanger, 
> threadName=exchange-worker-#50, blockedFor=2s]
> [2020-10-06 16:55:48,824][WARN ][tcp-disco-msg-worker-[9e18957a 
> 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Thread 
> [name="exchange-worker-#50", id=90, state=TIMED_WAITING, blockCnt=20, 
> waitCnt=48]
>     Lock 
> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14f29e0e,
>  ownerName=null, ownerId=-1]
> [2020-10-06 16:55:48,827][WARN ][tcp-disco-msg-worker-[9e18957a 
> 172.18.0.5:47500]-#2-#44][root1] Possible failure suppressed accordingly to a 
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, 
> timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=partition-exchanger, 
> igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=partition-exchanger, igniteInstanceName=null, finished=false, 
> heartbeatTs=1601992545941]
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1860)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1855)
>       at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:299)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to