[ https://issues.apache.org/jira/browse/IGNITE-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Daschinskiy updated IGNITE-13540: -------------------------------------- Fix Version/s: 2.10 > Exchange worker, waiting for new task from queue, considered as blocked. > ------------------------------------------------------------------------ > > Key: IGNITE-13540 > URL: https://issues.apache.org/jira/browse/IGNITE-13540 > Project: Ignite > Issue Type: Bug > Reporter: Ivan Daschinskiy > Assignee: Ivan Daschinskiy > Priority: Major > Fix For: 2.10 > > Time Spent: 10m > Remaining Estimate: 0h > > Waiting for new task in ExchangeWorker#body now is not marking as blocking > section. > So if network timeout (timeout for polling task from queue) is greater than > system worker blocked timeout, exchange worker thread is considered as > blocking. Sometimes this is reported in logs after few seconds when actually > PME has been finished > {noformat} > [2020-10-06 16:55:45,939][INFO > ][exchange-worker-#50][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager1] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=6, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, > node=163fd0f0-b9a4-4317-a28f-f7dbdb776076] > [2020-10-06 16:55:48,822][ERROR][tcp-disco-msg-worker-[9e18957a > 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [workerName=partition-exchanger, > threadName=exchange-worker-#50, blockedFor=2s] > [2020-10-06 16:55:48,824][WARN ][tcp-disco-msg-worker-[9e18957a > 172.18.0.5:47500]-#2-#44][org.apache.ignite.internal.util.typedef.G1] Thread > [name="exchange-worker-#50", id=90, state=TIMED_WAITING, blockCnt=20, > waitCnt=48] > Lock > [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14f29e0e, > ownerName=null, ownerId=-1] > [2020-10-06 16:55:48,827][WARN ][tcp-disco-msg-worker-[9e18957a > 172.18.0.5:47500]-#2-#44][root1] Possible failure suppressed accordingly to a > configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, > timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class > o.a.i.IgniteException: GridWorker [name=partition-exchanger, > igniteInstanceName=null, finished=false, heartbeatTs=1601992545941]]] > class org.apache.ignite.IgniteException: GridWorker > [name=partition-exchanger, igniteInstanceName=null, finished=false, > heartbeatTs=1601992545941] > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1860) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1855) > at > org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) > at > org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:299) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)