Vaibhav Shrivastava created IGNITE-23475:
--------------------------------------------
Summary: Ignite node gets stuck on the IgniteCache.remove method
for a long time
Key: IGNITE-23475
URL: https://issues.apache.org/jira/browse/IGNITE-23475
Project: Ignite
Issue Type: Bug
Affects Versions: 2.15
Reporter: Vaibhav Shrivastava
Hello Team
We are facing one issue with ignite 2.15
There is a node lets say X in the ignite cluster which established the outgoing
communication connection with ignite server node Y at 4:35:18
As per the logs of X below
{noformat}
04:35:18:703657 | class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi]: {INFO} Action
{Established outgoing communication connection [locAddr=/x.x.x.x:58454,
rmtAddr=Y/y.y.y.y:48102]} Thread
{grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%}{noformat}
Now at 05:46:18, X tried to update the Ignite Cache synchronously using
IgniteCache.remove method
But X receive warnings after around 70 seconds of IgniteCache.remove method
{noformat}
05:47:29:776422|0153-00271:Tracer_Ignite
[org.apache.ignite.internal.diagnostic]: {WARN} Action {First 10 long running
transactions [total=1]} Thread {sys-#103%WebCluster%}
05:47:29:776576|6956-00271:Tracer_Ignite
[org.apache.ignite.internal.diagnostic]: {WARN} Action {>>> Transaction
[startTime=05:46:18,514, curTime=05:47:29,763, systemTime=71248, userTime=1,
tx=GridNearTxLocal [mappings=IgniteTxMappingsSingleImpl
[mapping=GridDistributedTxMapping [entries=LinkedHashSet [IgniteTxEntry
[txKey=IgniteTxKey [key=com.host1.ignite........... thread=<failed to find
active thread 47>
05:47:29:786789|0851-00269:Tracer_Ignite
[org.apache.ignite.internal.diagnostic]: {INFO} Action
{GridNearOptimisticTxPrepareFuture waiting for local keys lock
[node=60cc9b4f-ca79-4124-89b3-0dc8caa5e7f4, topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], allKeysAdded=true, keys=GridConcurrentHashSet
[elements=KeySetView [IgniteTxKey [key=com.X.caches.key [idHash=401413886,
hash=-1793151138, sessionComponentName=temp_name], cacheId=1968825002]]]]
{noformat}
X continued to get these above warning
{noformat}
05:49:29:773714|0152-00202:Tracer_Ignite
[org.apache.ignite.internal.diagnostic]: {WARN} Action {First 10 long running
transactions [total=1]} Thread {sys-#83%WebCluster%}{noformat}
These warnings are coming after every 2mins
{noformat}
05:55:29:805464|0883-00119:Tracer_Ignite
[org.apache.ignite.internal.diagnostic]: {INFO} Action
{GridNearOptimisticTxPrepareFuture waiting for local keys lock
[node=60cc9b4f-ca79-4124-89b3-0dc8caa5e7f4, topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], allKeysAdded=true, keys=GridConcurrentHashSet
[elements=KeySetView [IgniteTxKey [key=com.X.caches.key [idHash=401413886,
hash=-1793151138], cacheId=1968825002]]]]{noformat}
At 06:02:10 on the X we receive the exception of connection timed out
{noformat}
06:02:10:333953|2921-00117:Tracer_Ignite [class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi]: {ERROR} Action
{Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1,
bytesRcvd=180, bytesSent=32428, bytesRcvd0=0, bytesSent0=0, select=true,
super=GridWorker [name=grid-nio-worker-tcp-comm-1,
igniteInstanceName=TcpCommunicationSpi, finished=false,
heartbeatTs=1729054929329, hashCode=909604085, interrupted=false,
runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null,
outRecovery=GridNioRecoveryDescriptor [acked=150, resendCnt=0, rcvCnt=0,
sentCnt=151, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode
[id=d20abb30-8118-4066-a13e-01ad3abc1548, consistentId=host1,127.0.0.1:48500,
addrs=ArrayList [X, 127.0.0.1], sockAddrs=null, discPort=48500, order=1,
intOrder=1, lastExchangeTime=1729035688365, loc=false,
ver=2.15.0#20230425-sha1:f98f7f35, isClient=false], connected=false,
connectCnt=2, queueLimit=4096, reserveCnt=2, pairedConnections=true],
closeSocket=true,
outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1,
super=GridNioSessionImpl [locAddr=/X:58454, rmtAddr=Y/y.y.y.y:48102,
createTime=1729049718699, closeTime=0, bytesSent=4431, bytesRcvd=20,
bytesSent0=0, bytesRcvd0=0, sndSchedTime=1729054918122,
lastSndTime=1729053978514, lastRcvTime=1729054918122, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@6536b6c, directMode=true],
GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]]]}
Thread {grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%}
java.io.IOException: Connection timed out
at java.base/sun.nio.ch.SocketDispatcher.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:47)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:330)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:284)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:259)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:417)
at
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1351)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2526)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2281)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.base/java.lang.Thread.run(Thread.java:840)
06:02:10:335005|0364-00117:Tracer_Ignite [class
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi]: {WARN} Action
{Client disconnected abruptly due to network connection loss or because the
connection was left open on application shutdown. [cls=class
o.a.i.i.util.nio.GridNioException, msg=Connection timed out]} Thread
{grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%} {noformat}
At ignite server side we received the connection closed error
{noformat}
06:02:10,377181 [53] DEBUG TcpCommunicationSpi(Ignite) - Session was closed but
there are unacknowledged messages, will try to reconnect
[rmtNode=60cc9b4f-ca79-4124-89b3-0dc8caa5e7f4]{noformat}
After this timeout a new outgoing connection got stablished and things started
working fine
--
This message was sent by Atlassian Jira
(v8.20.10#820010)