[
https://issues.apache.org/jira/browse/IGNITE-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Pavlov updated IGNITE-16621:
-----------------------------------
Labels: ise (was: )
> AtomicSequence.incrementAndGet() fails intermittently.
> ------------------------------------------------------
>
> Key: IGNITE-16621
> URL: https://issues.apache.org/jira/browse/IGNITE-16621
> Project: Ignite
> Issue Type: Bug
> Components: data structures
> Reporter: Vyacheslav Koptilin
> Assignee: Vyacheslav Koptilin
> Priority: Major
> Labels: ise
> Fix For: 2.13
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Using _IgniteAtomicSequence_ can lead to the following _AssertionError_:
> {noformat}
> java.lang.AssertionError: null
> at
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:307)
> at
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:298)
> at
> org.apache.ignite.internal.processors.cache.GridCacheUtils.retryTopologySafe(GridCacheUtils.java:1418)
> at
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.internalUpdate(GridCacheAtomicSequenceImpl.java:230)
> at
> org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.incrementAndGet(GridCacheAtomicSequenceImpl.java:135)
> {noformat}
> The following code produces the mentioned error:
> {code:java}
> private Callable<Long> internalUpdate(final long l, final boolean updated) {
> return new Callable<Long>() {
> @Override public Long call() throws Exception {
> assert distUpdateFreeTop.isHeldByCurrentThread() ||
> distUpdateLockedTop.isHeldByCurrentThread();
> try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView,
> PESSIMISTIC, REPEATABLE_READ)) {
> GridCacheAtomicSequenceValue seq = cacheView.get(key);
> checkRemoved();
> assert seq != null; <-- This assert can trigger the error in
> case the partition loss policy is IGNORE and the corresponding partition has
> been lost.
> {code}
> The root cause of the issue is that for in-memory case partition loss policy
> is IGNORE. Therefore, the following read can return a null value without any
> exceptions and trigger the mentioned AssertionError.
> {code:java}
> try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, PESSIMISTIC,
> REPEATABLE_READ)) {
> GridCacheAtomicSequenceValue seq = cacheView.get(key);
> {code}
> The possible workaround is setting a reasonable number of backups in
> AtomicConfiguration. Monitoring of lost partitions would be nice as well.
> The proposed solution is quite obvious. Need to change the assert _assert seq
> != null;_ to explicit check and throw a suitable exception if needed. This
> should allow the user to detect this and re-create the sequence, for example.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)