[ 
https://issues.apache.org/jira/browse/IGNITE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311467#comment-17311467
 ] 

Ilya Kasnacheev commented on IGNITE-14445:
------------------------------------------

It will also show different errors, such as assertion error in client:

{code}
[2021-03-30 
13:46:05,338][ERROR][tcp-client-disco-msg-worker-#216%distributed.IgniteCacheManyClientsTest8%-#2083%distributed.IgniteCacheManyClientsTest8%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=CRITICAL_ERROR, err=java.lang.AssertionError: lastVer=4, newVer=6, 
locNode=TcpDiscoveryNode [id=009a68fa-161f-4eef-b7ed-bfb115d00008, 
consistentId=009a68fa-161f-4eef-b7ed-bfb115d00008, addrs=ArrayList [127.0.0.1], 
sockAddrs=HashSet [/127.0.0.1:0], discPort=0, order=0, intOrder=0, 
lastExchangeTime=1617101161213, loc=true, ver=2.9.1#20210115-sha1:de970af4, 
isClient=true], msg=TcpDiscoveryNodeAddFinishedMessage 
[nodeId=c4c43b12-3293-4fbd-948d-acd211b00004, super=TcpDiscoveryAbstractMessage 
[sndNodeId=e278b5eb-2faf-4602-844f-d70467200000, 
id=b0a5db28871-e278b5eb-2faf-4602-844f-d70467200000, 
verifierNodeId=e278b5eb-2faf-4602-844f-d70467200000, topVer=6, pendingIdx=0, 
failedNodes=null, isClient=false]]]]
java.lang.AssertionError: lastVer=4, newVer=6, locNode=TcpDiscoveryNode 
[id=009a68fa-161f-4eef-b7ed-bfb115d00008, 
consistentId=009a68fa-161f-4eef-b7ed-bfb115d00008, addrs=ArrayList [127.0.0.1], 
sockAddrs=HashSet [/127.0.0.1:0], discPort=0, order=0, intOrder=0, 
lastExchangeTime=1617101161213, loc=true, ver=2.9.1#20210115-sha1:de970af4, 
isClient=true], msg=TcpDiscoveryNodeAddFinishedMessage 
[nodeId=c4c43b12-3293-4fbd-948d-acd211b00004, super=TcpDiscoveryAbstractMessage 
[sndNodeId=e278b5eb-2faf-4602-844f-d70467200000, 
id=b0a5db28871-e278b5eb-2faf-4602-844f-d70467200000, 
verifierNodeId=e278b5eb-2faf-4602-844f-d70467200000, topVer=6, pendingIdx=0, 
failedNodes=null, isClient=false]]
        at 
org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:912)
        at 
org.apache.ignite.spi.discovery.tcp.ClientImpl.access$3700(ClientImpl.java:146)
        at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeAddFinishedMessage(ClientImpl.java:2366)
        at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:2149)
        at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1988)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$1.body(ClientImpl.java:307)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
{code}

Or, more important, NPE in server node:
{code}
[2021-03-30 
14:07:05,133][ERROR][sys-#161%distributed.IgniteCacheManyClientsTest2%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: null]]
class org.apache.ignite.IgniteCheckedException: null
        at 
org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7563)
        at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:837)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
        at 
org.apache.ignite.internal.IgniteFeatures.nodeSupports(IgniteFeatures.java:160)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.isMvccRecoveryMessageRequired(IgniteTxManager.java:3335)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3246)
        at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7117)
        at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
        ... 4 more
{code}

Maybe we should make a crazy monkey test out of it.

> "Remote node does not observe current" after failure by not receiving metrics 
> from client
> -----------------------------------------------------------------------------------------
>
>                 Key: IGNITE-14445
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14445
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.9.1
>            Reporter: Ilya Kasnacheev
>            Priority: Major
>         Attachments: ignite-server-impl.patch
>
>
> A server node might fail a client node due to pauses in the network 
> connection:
> [15:07:16,330][WARNING][tcp-disco-msg-worker-[11cf0c06 10.212.120.71:57500 
> crd]-#2%hh_DynamicGrid_v2%][TcpDiscoverySpi] Failing client node due to not 
> receiving metrics updates from client node within 
> 'IgniteConfiguration.clientFailureDetectionTimeout' (consider increasing 
> configuration property) [timeout=120000, node=TcpDiscoveryNode 
> [id=9dbcfb86-a60e-4382-904f-57bffbe18c5c,consistentId=73B5811B-9644-48FD-A533-B4609FDAD591,
>  addrs=ArrayList [10.212.120.190], sockAddrs=HashSet 
> [VWNV02AX07080.HH.com/10.212.120.190:0], discPort=0, order=488, intOrder=248, 
> lastExchangeTime=1612397142960, loc=false, ver=2.8.1#20200521-sha1:86422096, 
> isClient=true]]
> Then, the client node will never understand that it is dropped by cluster and 
> will be endlessly trying to connect. I'm not sure what does discovery do on 
> the client node:
> {code}
> [15:07:42,689][SEVERE][Thread-219][TcpCommunicationSpi] Failed to send 
> message to remote node [node=TcpDiscoveryNode 
> [id=83fd7c70-839d-46ca-969f-bbb9661d6ab2, consistentId=127.1.1.1:57500, 
> addrs=ArrayList [127.1.1.1], sockAddrs=HashSet [test.com/127.1.1.1:57500], 
> discPort=57500, order=1, intOrder=1, lastExchangeTime=1612397256785, 
> loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], 
> msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, 
> timeout=0, skipOnTimeout=false, msg=GridNearAtomicFullUpdateRequest 
> [keys=ArrayList [UserKeyCacheObjectImpl [part=292, 
> val=TestModel:TEST|bbf4da4d-c3d7-4b46-98b6-0de70c30f668, hasValBytes=true]], 
> conflictTtls=null, conflictExpireTimes=null, 
> expiryPlc=org.apache.ignite.internal.processors.platform.cache.expiry.PlatformExpiryPolicy@3fb1b76e,
>  initSize=1, filter=null, parent=GridNearAtomicAbstractUpdateRequest 
> [res=null, flags=keepBinary]]]]
> class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: 
> Remote node does not observe current node in topology : 
> 83fd7c70-839d-46ca-969f-bbb9661d6ab2
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1296)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:312)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to