[
https://issues.apache.org/jira/browse/IGNITE-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Goncharuk updated IGNITE-12760:
--------------------------------------
Fix Version/s: 2.9
> Prevent AssertionError on message unmarshalling, when classLoaderId contains
> id of node that already left
> ---------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-12760
> URL: https://issues.apache.org/jira/browse/IGNITE-12760
> Project: Ignite
> Issue Type: Bug
> Reporter: Denis Chudov
> Assignee: Denis Chudov
> Priority: Major
> Fix For: 2.9
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Following assertion error triggers failure handler and crashes the node. Can
> possibly crash the whole cluster.
> {code:java}
> 2020-02-18
> 14:34:09.775\[ERROR]\[query-#146129%DPL_GRID%DplGridNodeName%]\[o.a.i.i.p.cache.GridCacheIoManager]
> Failed to process message \[senderId=727757ed-4ad4-4779-bda9-081525725cce,
> msg=GridCacheQueryRequest \[id=178,
> cacheName=com.sbt.tokenization.data.entity.KEKEntity_DPL_union-module,
> type=SCAN, fields=false, clause=null, clsName=null, keyValFilter=null,
> rdc=null, trans=null, pageSize=1024, incBackups=false, cancel=false,
> incMeta=false, all=false, keepBinary=true,
> subjId=727757ed-4ad4-4779-bda9-081525725cce, taskHash=0, part=-1,
> topVer=AffinityTopologyVersion \[topVer=97, minorTopVer=0], sendTimestamp=-1,
> receiveTimestamp=-1, super=GridCacheIdMessage \[cacheId=-1129073400,
> super=GridCacheMessage \[msgId=179, depInfo=GridDeploymentInfoBean
> \[clsLdrId=c32670e3071-d30ee64b-0833-45d4-abbe-fb6282669caa, depMode=SHARED,
> userVer=0, locDepOwner=false, participants=null],
> lastAffChangedTopVer=AffinityTopologyVersion \[topVer=8, minorTopVer=6],
> err=null, skipPrepare=false]]]]
> java.lang.AssertionError: null
> at
> org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:918)
> at
> org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:889)
> at
> org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.p2pContext(GridCacheDeploymentManager.java:422)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1576)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1565)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:130)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1092)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
> There is no fair reproducer for now, but it seems that we should prevent such
> situation in general like following:
> 1) check the correctness of the message before it will be sent - inside of
> GridCacheDeploymentManager#prepare. If we have the corresponding class loader
> on local node, we can try to fix message and replace wrong class loader with
> local one.
> 2) log suspicious deployments which we receive from
> GridDeploymentManager#deploy - maybe we have obsolete deployments in caches.
> 3) possibly we can remove this assertion, we should have this class on sender
> node and use it as class loader id, and if we don't, we will receive
> exception on finishUnmarshall (Failed to peer load class) and try to process
> this situation with GridCacheIoManager#processFailedMessage.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)