Vyacheslav Koptilin created IGNITE-16621:
--------------------------------------------
Summary: AtomicSequence.incrementAndGet() fails intermittently.
Key: IGNITE-16621
URL: https://issues.apache.org/jira/browse/IGNITE-16621
Project: Ignite
Issue Type: Bug
Components: data structures
Reporter: Vyacheslav Koptilin
Assignee: Vyacheslav Koptilin
Using _IgniteAtomicSequence_ can lead to the following _AssertionError_:
{noformat}
java.lang.AssertionError: null
at
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:307)
at
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:298)
at
org.apache.ignite.internal.processors.cache.GridCacheUtils.retryTopologySafe(GridCacheUtils.java:1418)
at
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.internalUpdate(GridCacheAtomicSequenceImpl.java:230)
at
org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.incrementAndGet(GridCacheAtomicSequenceImpl.java:135)
{noformat}
The following code produces the mentioned error:
{code:java}
private Callable<Long> internalUpdate(final long l, final boolean updated) {
return new Callable<Long>() {
@Override public Long call() throws Exception {
assert distUpdateFreeTop.isHeldByCurrentThread() ||
distUpdateLockedTop.isHeldByCurrentThread();
try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView,
PESSIMISTIC, REPEATABLE_READ)) {
GridCacheAtomicSequenceValue seq = cacheView.get(key);
checkRemoved();
assert seq != null; <-- This assert can trigger the error in
case the partition loss policy is IGNORE and the corresponding partition has
been lost.
{code}
The root cause of the issue is that for in-memory case partition loss policy is
IGNORE. Therefore, the following read can return a null value without any
exceptions and trigger the mentioned AssertionError.
{code:java}
try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, PESSIMISTIC,
REPEATABLE_READ)) {
GridCacheAtomicSequenceValue seq = cacheView.get(key);
{code}
The possible workaround is setting a reasonable number of backups in
AtomicConfiguration. Monitoring of lost partitions would be nice as well.
The proposed solution is quite obvious. Need to change the assert _assert seq
!= null;_ to explicit check and throw a suitable exception if needed. This
should allow the user to detect this and re-create the sequence, for example.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)