Ilya Kasnacheev created IGNITE-13824:
----------------------------------------
Summary: Suspected: "Connection reset by peer" of Communication
socket triggers FH
Key: IGNITE-13824
URL: https://issues.apache.org/jira/browse/IGNITE-13824
Project: Ignite
Issue Type: Bug
Components: networking
Affects Versions: 2.9
Reporter: Ilya Kasnacheev
I would expect a network error in tcp-comm-worker to never trigger a Failure
Hander, yet it happens:
{code}
[12/3/20 16:08:26:410 GMT] 000000bd IgniteKernal W Possible too long JVM
pause: 2418 milliseconds.
[12/3/20 16:08:27:465 GMT] 000000c5 TcpCommunicat W Client disconnected
abruptly due to network connection loss or because the connection was left open
on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException,
msg=Connection reset by peer]
[12/3/20 16:08:27:411 GMT] 000000c5 TcpCommunicat E Failed to process
selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
[super=AbstractNioClientWorker [idx=0, bytesRcvd=48849402273,
bytesSent=15994664546, bytesRcvd0=54446, bytesSent0=102, select=true,
super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=null,
finished=false, heartbeatTs=1607011706410, hashCode=433635054,
interrupted=false, runner=grid-nio-worker-tcp-comm-0-#51]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=GridNioRecoveryDescriptor [acked=9025120, resendCnt=0,
rcvCnt=9025150, sentCnt=9025152, reserved=true, lastAck=9025120,
nodeLeft=false, node=TcpDiscoveryNode [id=b3ca311e-077f-42a5-884a-807b539730b6,
consistentId=10.60.46.12:48500, addrs=ArrayList [10.60.46.12],
sockAddrs=HashSet [hex-wgc-p-web02/10.60.46.12:48500], discPort=48500, order=1,
intOrder=1, lastExchangeTime=1607006097079, loc=false,
ver=2.9.0#20201015-sha1:70742da8, isClient=false], connected=false,
connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
outRecovery=GridNioRecoveryDescriptor [acked=9025120, resendCnt=0,
rcvCnt=9025150, sentCnt=9025152, reserved=true, lastAck=9025120,
nodeLeft=false, node=TcpDiscoveryNode [id=b3ca311e-077f-42a5-884a-807b539730b6,
consistentId=10.60.46.12:48500, addrs=ArrayList [10.60.46.12],
sockAddrs=HashSet [hex-wgc-p-web02/10.60.46.12:48500], discPort=48500, order=1,
intOrder=1, lastExchangeTime=1607006097079, loc=false,
ver=2.9.0#20201015-sha1:70742da8, isClient=false], connected=false,
connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
closeSocket=true,
outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1,
super=GridNioSessionImpl [locAddr=/10.223.132.3:52550,
rmtAddr=/10.60.46.12:48100, createTime=1607006097572, closeTime=0,
bytesSent=15994657850, bytesRcvd=48849402273, bytesSent0=102, bytesRcvd0=54446,
sndSchedTime=1607006097572, lastSndTime=1607011706410,
lastRcvTime=1607011706410, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@93200255, directMode=true],
GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]]]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:235)
at sun.nio.ch.IOUtil.read(IOUtil.java:204)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:394)
at
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1330)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2472)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2239)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:822)
[12/3/20 16:08:44:437 GMT] 000000c4 SystemOut O [16:08:44] Possible failure
suppressed accordingly to a configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
igniteInstanceName=null, finished=false, heartbeatTs=1607011706420]]]
[12/3/20 16:08:44:436 GMT] 000000c4 W
java.util.logging.LogManager$RootLogger log Possible failure suppressed
accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler
[tryStop=false, timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=tcp-comm-worker, igniteInstanceName=null, finished=false,
heartbeatTs=1607011706420]]]
class org.apache.ignite.IgniteException:
GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false,
heartbeatTs=1607011706420]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1806)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1801)
at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
at
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:221)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:822)
[12/3/20 16:08:44:434 GMT] 000000c4 G W Thread
[name="tcp-comm-worker-#1-#63", id=211, state=WAITING, blockCnt=2, waitCnt=100]
[12/3/20 16:08:44:432 GMT] 000000c4 G E Blocked system-critical
thread has been detected. This can lead to cluster-wide undefined behaviour
[workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1-#63, blockedFor=18s]
[12/3/20 16:09:14:486 GMT] 000000c4 SystemOut O [16:09:14] Possible failure
suppressed accordingly to a configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
igniteInstanceName=null, finished=false, heartbeatTs=1607011736000]]]
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)