[ 
https://issues.apache.org/jira/browse/GEODE-10395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mario Kevo updated GEODE-10395:
-------------------------------
    Labels:   (was: needsTriage)

> TXLockIdImpl memory leak after CommitConflictException from another node
> ------------------------------------------------------------------------
>
>                 Key: GEODE-10395
>                 URL: https://issues.apache.org/jira/browse/GEODE-10395
>             Project: Geode
>          Issue Type: Bug
>    Affects Versions: 1.14.0, 1.15.0
>            Reporter: Eugene Nedzvetsky
>            Assignee: Mario Kevo
>            Priority: Major
>
> org.apache.geode.internal.cache.locks.TXLockServiceImpl#txLock:120 adds 
> TXLockIdImpl  objects to TXLockServiceImpl#txLockIdList. 
> {code:java}
> synchronized (txLockIdList) {
>         txLockId = new TXLockIdImpl(dlock.getDistributionManager().getId());
>         txLockIdList.add(txLockId);
>       }
> {code}
> These objects will not be removed from this list if dlock.acquireTryLocks 
> returned false.
> {code:java}
>       gotLocks = dlock.acquireTryLocks(batch, TIMEOUT_MILLIS, LEASE_MILLIS, 
> keyIfFail);
>       if (gotLocks) { // ...otherwise race can occur between tryLocks and 
> readLock
>         acquireRecoveryReadLock();
>       } else if (keyIfFail[0] != null) {
>         throw new CommitConflictException(
>             String.format("Concurrent transaction commit detected %s",
>                 keyIfFail[0]));
>       } else {
>         throw new CommitConflictException(
>             String.format("Failed to request try locks from grantor: %s",
>                 dlock.getLockGrantorId()));
>       }
> {code}
> It throws CommitConflictException and after that system doesn't have a 
> txLockId reference and this txLockId will be never removed from this list.
> It produces critical performance degradation. txLockIdList has a few hundred 
> thousand txLocks after a few weeks and TXLockServiceImpl#release iterates 
> this list 2 times on every tx commit and the same time "synchronized 
> (txLockIdList)" locks other threads.
> TXLockIdImpl#equals works really slow because it checks bunch of variables in 
> memberId.equals(that.memberId).
> {code:java}
> public void release(TXLockId txLockId) {
>     synchronized (txLockIdList) {
>       if (!txLockIdList.contains(txLockId)) {
>         // TXLockService.destroyServices can be invoked in cache.close().
>         // Other P2P threads could process message such as TXCommitMessage 
> afterwards,
>         // and invoke TXLockService.createDTLS(). It could create a new 
> TXLockService
>         // which will have a new empty list (txLockIdList) and it will not
>         // contain the originally added txLockId
>         throw new IllegalArgumentException(
>             String.format("Invalid txLockId not found: %s",
>                 txLockId));
>       }
>       dlock.releaseTryLocks(txLockId, () -> {
>         return recovering;
>       });
>       txLockIdList.remove(txLockId);
>       releaseRecoveryReadLock();
>     }
>   }
> {code}
> I think TXLockServiceImpl#txLock should remove this txLockId from 
> TXLockServiceImpl#txLockIdList in case of CommitConflictException:
> {code:java}
>  if (gotLocks) { // ...otherwise race can occur between tryLocks and readLock
>                 acquireRecoveryReadLock();
>             } else if (keyIfFail[0] != null) {
>                 synchronized (this.txLockIdList) {
>                     this.txLockIdList.remove(txLockId);
>                 }
>                 throw new CommitConflictException(
>                         String.format("Concurrent transaction commit detected 
> %s",
>                                 keyIfFail[0]));
>             } else {
>                 synchronized (this.txLockIdList) {
>                     this.txLockIdList.remove(txLockId);
>                 }
>                 throw new CommitConflictException(
>                         String.format("Failed to request try locks from 
> grantor: %s",
>                                 this.dlock.getLockGrantorId()));
>             }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to