[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800425#comment-17800425
 ] 

Vipul Thakur commented on IGNITE-21059:
---------------------------------------

Hi 

[~zstan] 

*PFB for another such scenario from the logs file. This is the same kind of 
logs.[^ignite-server-nohup.out] , may be if you search for it in this file you 
will get more context for you to observe.* 

[2023-11-30T14:21:46,945][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
 <EventCustomerCache> Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641522, nodeOrder=53, dataCenterId=0], 
threadId=567, futId=13db6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
timeout=30000, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641522, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55445052, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]]]]]
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed 
to acquire lock within provided timeout for transaction [timeout=30000, 
tx=GridDhtTxLocal[xid=5f4b66f1c81-00000000-12a3-06d7-0000-000000000001, 
xidVersion=GridCacheVersion [topVer=312674007, order=1701333873909, 
nodeOrder=1, dataCenterId=0], nearXidVersion=GridCacheVersion 
[topVer=312674007, order=1701333641522, nodeOrder=53, dataCenterId=0], 
concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=MARKED_ROLLBACK, 
invalidate=false, rollbackOnly=true, 
nodeId=f751efe5-c44c-4b3c-bcd3-dd5866ec0bdd, timeout=30000, 
startTime=1701334276938, duration=30003]]
    at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:1798)
 ~[ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:1746)
 ~[ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:86)
 ~[ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:292)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:285)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:464)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:348)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:336)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:576)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.processors.cache.GridCacheCompoundIdentityFuture.onDone(GridCacheCompoundIdentityFuture.java:56)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:555)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:807)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$900(GridDhtLockFuture.java:93)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1207)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:234)
 [ignite-core-2.14.0.jar:2.14.0]
    at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) 
[ignite-core-2.14.0.jar:2.14.0]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351]
2023-11-30T14:21:46,945][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
 <EventCustomerCache> Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641522, nodeOrder=53, dataCenterId=0], 
threadId=567, futId=13db6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
timeout=30000, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641522, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55445052, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]]]]]
2023-11-30T14:21:48,343][INFO 
][db-checkpoint-thread-#762%EVENT_PROCESSING%][Checkpointer] Skipping 
checkpoint (no pages were modified) [checkpointBeforeLockTime=1ms, 
checkpointLockWait=0ms, checkpointListenersExecuteTime=1ms, 
checkpointLockHoldTime=4ms, reason='timeout']
2023-11-30T14:21:55,776][WARN ][sys-#18981%EVENT_PROCESSING%][diagnostic] First 
10 long running transactions [total=15]
2023-11-30T14:21:55,776][WARN ][sys-#18981%EVENT_PROCESSING%][diagnostic] >>> 
Transaction [startTime=14:08:02,187, curTime=14:21:55,743, tx=GridDhtTxRemote 
[nearNodeId=46de26cf-a0d2-4a99-b278-734eb44f1330, 
rmtFutId=6f2623f1c81-c87b49b4-cfa3-4140-b9e7-69e2a57b6830, 
nearXidVer=GridCacheVersion [topVer=312674007, order=1701333633815, 
nodeOrder=52, dataCenterId=0], storeWriteThrough=false, 
super=GridDistributedTxRemoteAdapter [explicitVers=null, started=true, 
commitAllowed=0, txState=IgniteTxRemoteSingleStateImpl [entry=IgniteTxEntry 
[txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=81, val=9022054169, 
hasValBytes=true], cacheId=1760547175], val=TxEntryValueHolder 
[val=com.jio.digitalapi.eventprocessing.persistent.datamodel.ServiceEntity 
[idHash=23146844, hash=-1542268073, serviceState=TERMINATED, 
accountId=100962392318, timeLastModifed=2023-11-30T14:08:02.188, 
customerId=1487385033, entityId=9022054169, 
timeCreated=2020-05-06T01:11:23.682, serviceId=9022054169, productType=1, 
serviceCharingIds=919022054169|405864090495748|405864144962514], op=UPDATE], 
prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder 
[val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, 
conflictVer=null, explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] 
[], filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry 
[rdrs=ReaderId[] [], part=81, super=GridDistributedCacheEntry 
[super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=81, val=9022054169, 
hasValBytes=true], 
val=com.jio.digitalapi.eventprocessing.persistent.datamodel.ServiceEntity 
[idHash=814654353, hash=-1899253163, serviceState=SUSPENDED, 
accountId=100962392318, timeLastModifed=2023-11-21T13:00:14.527, 
customerId=1487385033, entityId=9022054169, 
timeCreated=2020-05-06T01:11:23.682, serviceId=9022054169, productType=1, 
serviceCharingIds=919022054169|405864090495748|405864144962514], 
ver=GridCacheVersion [topVer=310345733, order=1701229328778, nodeOrder=4, 
dataCenterId=0], hash=621622236, extras=GridCacheMvccEntryExtras 
[mvcc=GridCacheMvcc [locs=null, rmts=LinkedList [GridCacheMvccCandidate 
[nodeId=ca92e2c7-b8ed-42c8-a89b-748815640c78, ver=GridCacheVersion 
[topVer=312674007, order=1701333639730, nodeOrder=4, dataCenterId=0], 
threadId=117, id=47917537, topVer=AffinityTopologyVersion [topVer=-1, 
minorTopVer=0], reentry=null, otherNodeId=46de26cf-a0d2-4a99-b278-734eb44f1330, 
otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, 
serOrder=null, key=KeyCacheObjectImpl [part=81, val=9022054169, 
hasValBytes=true], 
masks=local=0|owner=0|ready=0|reentry=0|used=0|tx=1|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0,
 prevVer=null, nextVer=null]]]], flags=2]]], prepared=1, locked=false, 
nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, 
partUpdateCntr=26980702, serReadVer=null, xidVer=null]], txLbl=null, 
super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=312674007, 
order=1701333639730, nodeOrder=4, dataCenterId=0], writeVer=GridCacheVersion 
[topVer=312674007, order=1701333639731, nodeOrder=4, dataCenterId=0], 
implicit=false, loc=false, threadId=117, startTime=1701333482187, 
nodeId=ca92e2c7-b8ed-42c8-a89b-748815640c78, isolation=REPEATABLE_READ, 
concurrency=PESSIMISTIC, timeout=30000, sysInvalidate=false, sys=false, plc=2, 
commitVer=null, finalizing=NONE, invalidParts=null, state=PREPARED, 
timedOut=false, topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], 
mvccSnapshot=null, skipCompletedVers=false, parentTx=null, duration=833556ms, 
onePhaseCommit=false]]]]

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> --------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-21059
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21059
>             Project: Ignite
>          Issue Type: Bug
>          Components: binary, clients
>    Affects Versions: 2.14
>            Reporter: Vipul Thakur
>            Priority: Critical
>         Attachments: Ignite_server_logs.zip, cache-config-1.xml, 
> client-service.zip, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup-1.out, ignite-server-nohup.out, long_txn_.png
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to