[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

Vipul Thakur (Jira) Tue, 12 Dec 2023 05:09:03 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795735#comment-17795735
 ]


Vipul Thakur commented on IGNITE-21059:
---------------------------------------

Evidence that txn timeout is enabled at client end : 

 

2023-11-30T14:19:01,783][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
 <EventCustomerCache> Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641101, nodeOrder=53, dataCenterId=0], 
threadId=372, futId=9c4a6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
timeout=30000, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641101, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55444220, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]]]]]
[2023-11-30T14:19:44,579][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
 <EventCustomerCache> Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641190, nodeOrder=53, dataCenterId=0], 
threadId=897, futId=a3ba6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
*timeout=30000, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ,* retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641190, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55444392, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]]]]]
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed 
to acquire lock within provided timeout for transaction [timeout=30000, 
tx=GridDhtTxLocal[xid=c8a166f1c81-00000000-12a3-06d7-0000-000000000001, 
xidVersion=GridCacheVersion [topVer=312674007, order=1701333834380, 
nodeOrder=1, dataCenterId=0], nearXidVersion=GridCacheVersion 
[topVer=312674007, order=1701333641190, nodeOrder=53, dataCenterId=0], 
concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=MARKED_ROLLBACK, 
invalidate=false, rollbackOnly=true, 
nodeId=f751efe5-c44c-4b3c-bcd3-dd5866ec0bdd, timeout=30000, 
startTime=1701334154571, {*}duration=30003]{*}]

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> --------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-21059
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21059
>             Project: Ignite
>          Issue Type: Bug
>          Components: binary, clients
>    Affects Versions: 2.14
>            Reporter: Vipul Thakur
>            Priority: Critical
>         Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

Reply via email to