[
https://issues.apache.org/jira/browse/IGNITE-25538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksey Plekhanov resolved IGNITE-25538.
----------------------------------------
Fix Version/s: 2.18
Release Note: Fixed ROLLED_BACK transaction removal from active
transaction list on timeout during initialization
Resolution: Fixed
Fixed by IGNITE-25541
> ROLLED_BACK transactions are not removed from active transactions list
> -----------------------------------------------------------------------
>
> Key: IGNITE-25538
> URL: https://issues.apache.org/jira/browse/IGNITE-25538
> Project: Ignite
> Issue Type: Bug
> Reporter: Mikhail Petrov
> Assignee: Aleksey Plekhanov
> Priority: Minor
> Labels: ise
> Fix For: 2.18
>
>
> User can observe the following output of `control.sh tx` command:
> {code:java}
> Matching transactions:
> TcpDiscoveryNode [id=34fd49ed-c325-4a93-a32c-3726c1c19130,
> addrs=[10.19.138.119], order=3, ver=16.1.3#20241226-sha1:900bfa69,
> isClient=false, consistentId=epk_rb_si_pplad-pprbrbepk0071.ca.sbrf.ru]
> Tx: [xid=0a2e8e50791-00000000-156e-2f01-0000-000000000013,
> label=UcpSearchServiceDecorator.searchByClientId, state=ROLLED_BACK,
> startTime=2025-05-26 23:53:58.515, duration=224437 sec,
> isolation=READ_COMMITTED, concurrency=PESSIMISTIC, topVer=N/A, timeout=0 sec,
> size=0, dhtNodes=[],
> nearXid=0a2e8e50791-00000000-156e-2f01-0000-000000000013,
> parentNodeIds=[86cc9e5e]]
> Tx: [xid=087e3040791-00000000-156e-2f01-0000-000000000030,
> label=bs-ucp-4g-update-service, state=ROLLED_BACK, startTime=2025-05-25
> 23:45:45.961, duration=311329 sec, isolation=READ_COMMITTED,
> concurrency=PESSIMISTIC, topVer=N/A, timeout=0 sec, size=0, dhtNodes=[],
> nearXid=087e3040791-00000000-156e-2f01-0000-000000000030,
> parentNodeIds=[60400a24]]
> Tx: [xid=0e60d620791-00000000-156e-2f01-0000-000000000035,
> label=CloudClientSearchService.byCriteria, state=ROLLED_BACK,
> startTime=2025-05-24 23:49:05.016, duration=397530 sec,
> isolation=READ_COMMITTED, concurrency=PESSIMISTIC, topVer=N/A, timeout=0 sec,
> size=0, dhtNodes=[],
> nearXid=0e60d620791-00000000-156e-2f01-0000-000000000035,
> parentNodeIds=[448e854c]]
> TcpDiscoveryNode [id=9f11128e-c5a2-4700-af6b-c4777edfa31b,
> addrs=[10.19.138.75], order=54, ver=16.1.3#20241226-sha1:900bfa69,
> isClient=false, consistentId=epk_rb_si_pplad-pprbrbepk0025.ca.sbrf.ru]
> Command [TX] finished with code: 0
> {code}
> From the user perspective the mentioned output can be interpreted as bunch of
> LRTs (long running transaction). Moreover this transactions cannot be
> `killed` through contro.sh --kill command and are present in active
> transactions list until node is rebooted.
> It worth to mention that the described problem is not reproduced for every
> rolled back transaction, but for some under certain conditions.
> Reproducer:
> 1. Start server node.
> 2. Start tx through thin client with timeout.
> 3. Inject sleep in IgniteTxManager#onCreated after isCompleted check with
> value greater than tx timeout. It can definitely be a case if the thread that
> started the transactions is switched by the scheduler.
> 4. Wait for tx to complete with timeout error.
> As a result the transaction is rolled back by timeout worker and then is
> stored in active transactions map in IgniteTxManager#onCreated method.
> The described above "hanging" transactions in ROLLED_BACK state do not hold
> any data key locks and does not affect PME in any way.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)