Vyacheslav Koptilin created IGNITE-16621:
--------------------------------------------

             Summary: AtomicSequence.incrementAndGet() fails intermittently.
                 Key: IGNITE-16621
                 URL: https://issues.apache.org/jira/browse/IGNITE-16621
             Project: Ignite
          Issue Type: Bug
          Components: data structures
            Reporter: Vyacheslav Koptilin
            Assignee: Vyacheslav Koptilin


Using _IgniteAtomicSequence_ can lead to the following _AssertionError_:
{noformat}
java.lang.AssertionError: null
at 
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:307)
at 
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:298)
at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.retryTopologySafe(GridCacheUtils.java:1418)
at 
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.internalUpdate(GridCacheAtomicSequenceImpl.java:230)
at 
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.incrementAndGet(GridCacheAtomicSequenceImpl.java:135)
{noformat}

The following code produces the mentioned error:
{code:java}
private Callable<Long> internalUpdate(final long l, final boolean updated) {
    return new Callable<Long>() {
        @Override public Long call() throws Exception {
            assert distUpdateFreeTop.isHeldByCurrentThread() || 
distUpdateLockedTop.isHeldByCurrentThread();

            try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, 
PESSIMISTIC, REPEATABLE_READ)) {
                GridCacheAtomicSequenceValue seq = cacheView.get(key);

                checkRemoved();

                assert seq != null; <-- This assert can trigger the error in 
case the partition loss policy is IGNORE and the corresponding partition has 
been lost.
{code}

The root cause of the issue is that for in-memory case partition loss policy is 
IGNORE. Therefore, the following read can return a null value without any 
exceptions and trigger the mentioned AssertionError.
{code:java}
try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, PESSIMISTIC, 
REPEATABLE_READ)) {
    GridCacheAtomicSequenceValue seq = cacheView.get(key);
{code}

The possible workaround is setting a reasonable number of backups in 
AtomicConfiguration. Monitoring of lost partitions would be nice as well.

The proposed solution is quite obvious. Need to change the assert _assert seq 
!= null;_ to explicit check and throw a suitable exception if needed. This 
should allow the user to detect this and re-create the sequence, for example.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to