[ 
https://issues.apache.org/jira/browse/IGNITE-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Daschinskiy updated IGNITE-13540:
--------------------------------------
    Description: 
Waiting for new task in ExchangeWorker#body now is not marking as blocking 
section.
So if network timeout (timeout for polling task from queue) is greater than 
system worker blocked timeout, exchange worker thread is considered as 
blocking. Sometimes this is reported in logs after few seconds when actually 
PME has been finished


{noformat}
[2020-10-06 16:55:45,939][INFO 
][exchange-worker-#50][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager1]
 Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=6, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, 
node=163fd0f0-b9a4-4317-a28f-f7dbdb776076]
[2020-10-06 16:55:48,822][ERROR][tcp-disco-msg-worker-[9e18957a 
172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [workerName=partition-exchanger, 
threadName=exchange-worker-#50, blockedFor=2s]
[2020-10-06 16:55:48,824][WARN ][tcp-disco-msg-worker-[9e18957a 
172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Thread 
[name="exchange-worker-#50", id=90, state=TIMED_WAITING, blockCnt=20, 
waitCnt=48]
    Lock 
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14f29e0e,
 ownerName=null, ownerId=-1]

[2020-10-06 16:55:48,827][WARN ][tcp-disco-msg-worker-[9e18957a 
172.18.0.5:47500]-#2-#44][root1] Possible failure suppressed accordingly to a 
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=partition-exchanger, 
igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]]]
class org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, 
igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1860)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1855)
        at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
        at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:299)
{noformat}


  was:
Waiting for new task in ExchangeWorker#body now is not marking as blocking 
section.
So if network timeout (timeout for polling task from queue) is greater than 
system worker blocked timeout, exchange worker thread is considered as 
blocking. Sometimes this is reported in logs after few seconds when actually 
PME is finished


{noformat}
[2020-10-06 16:55:45,939][INFO 
][exchange-worker-#50][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager1]
 Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=6, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, 
node=163fd0f0-b9a4-4317-a28f-f7dbdb776076]
[2020-10-06 16:55:48,822][ERROR][tcp-disco-msg-worker-[9e18957a 
172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [workerName=partition-exchanger, 
threadName=exchange-worker-#50, blockedFor=2s]
[2020-10-06 16:55:48,824][WARN ][tcp-disco-msg-worker-[9e18957a 
172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Thread 
[name="exchange-worker-#50", id=90, state=TIMED_WAITING, blockCnt=20, 
waitCnt=48]
    Lock 
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14f29e0e,
 ownerName=null, ownerId=-1]

[2020-10-06 16:55:48,827][WARN ][tcp-disco-msg-worker-[9e18957a 
172.18.0.5:47500]-#2-#44][root1] Possible failure suppressed accordingly to a 
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=partition-exchanger, 
igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]]]
class org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, 
igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1860)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1855)
        at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
        at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:299)
{noformat}



> Exchange worker, waiting for new task from queue, considered as blocked.
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-13540
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13540
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Daschinskiy
>            Assignee: Ivan Daschinskiy
>            Priority: Major
>
> Waiting for new task in ExchangeWorker#body now is not marking as blocking 
> section.
> So if network timeout (timeout for polling task from queue) is greater than 
> system worker blocked timeout, exchange worker thread is considered as 
> blocking. Sometimes this is reported in logs after few seconds when actually 
> PME has been finished
> {noformat}
> [2020-10-06 16:55:45,939][INFO 
> ][exchange-worker-#50][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager1]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=6, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, 
> node=163fd0f0-b9a4-4317-a28f-f7dbdb776076]
> [2020-10-06 16:55:48,822][ERROR][tcp-disco-msg-worker-[9e18957a 
> 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [workerName=partition-exchanger, 
> threadName=exchange-worker-#50, blockedFor=2s]
> [2020-10-06 16:55:48,824][WARN ][tcp-disco-msg-worker-[9e18957a 
> 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Thread 
> [name="exchange-worker-#50", id=90, state=TIMED_WAITING, blockCnt=20, 
> waitCnt=48]
>     Lock 
> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14f29e0e,
>  ownerName=null, ownerId=-1]
> [2020-10-06 16:55:48,827][WARN ][tcp-disco-msg-worker-[9e18957a 
> 172.18.0.5:47500]-#2-#44][root1] Possible failure suppressed accordingly to a 
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, 
> timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=partition-exchanger, 
> igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=partition-exchanger, igniteInstanceName=null, finished=false, 
> heartbeatTs=1601992545941]
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1860)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1855)
>       at 
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:299)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to