[ 
https://issues.apache.org/jira/browse/IGNITE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Vinogradov updated IGNITE-17908:
--------------------------------------
    Labels: ise  (was: )

> AssertionError LWM after reserved on data insertion after the cluster restart
> -----------------------------------------------------------------------------
>
>                 Key: IGNITE-17908
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17908
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Vinogradov
>            Priority: Major
>              Labels: ise
>         Attachments: LwmAfterReservedTest.java
>
>
> After the cluster restart you may see the following assertion:
> {code}
> java.lang.AssertionError: LWM after reserved: lwm=2030, reserved=2010, 
> cntr=Counter [lwm=2030, missed=[], hwm=2030, reserved=2011]
>       at 
> org.apache.ignite.internal.processors.cache.PartitionUpdateCounterTrackingImpl.reserve(PartitionUpdateCounterTrackingImpl.java:270)
>       at 
> org.apache.ignite.internal.processors.cache.PartitionUpdateCounterErrorWrapper.reserve(PartitionUpdateCounterErrorWrapper.java:58)
>       at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.getAndIncrementUpdateCounter(IgniteCacheOffheapManagerImpl.java:1594)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.getAndIncrementUpdateCounter(GridCacheOffheapManager.java:2483)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.getAndIncrementUpdateCounter(GridDhtLocalPartition.java:942)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.calculatePartitionUpdateCounters(IgniteTxLocalAdapter.java:510)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1356)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:726)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1132)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareAsyncLocal(GridNearTxLocal.java:4282)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareColocatedTx(IgniteTxHandler.java:303)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.proceedPrepare(GridNearOptimisticTxPrepareFuture.java:565)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepareSingle(GridNearOptimisticTxPrepareFuture.java:392)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepare0(GridNearOptimisticTxPrepareFuture.java:335)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepareOnTopology(GridNearOptimisticTxPrepareFutureAdapter.java:205)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepare(GridNearOptimisticTxPrepareFutureAdapter.java:129)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareNearTxLocal(GridNearTxLocal.java:3946)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:3994)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.optimisticPutFuture(GridNearTxLocal.java:3051)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync0(GridNearTxLocal.java:729)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync(GridNearTxLocal.java:484)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2511)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2509)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4284)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2509)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2487)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2466)
>       at 
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1332)
>       at 
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:867)
>       at 
> org.apache.ignite.util.BrokenRebalanceTest.testCountersOnCrachRecovery(BrokenRebalanceTest.java:191)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2506)
> {code}
> when clusted was under the load and has a counters gaps on restart.
> See  [^LwmAfterReservedTest.java] 
> Possible solution is to fix 
> {{PartitionUpdateCounterTrackingImpl#reservedCntr}} calculation 
> from 
> {code}
> long max = Math.max(val, curLwm);
> {code}
> to something like
> {code}
> long max = Math.max(val, curLwm);
> max = Math.max(max, highestAppliedCounter());
> {code}
> Another possible fix is to get rid or 
> {{PartitionUpdateCounterTrackingImpl#reservedCntr}} and always use 
> {{highestAppliedCounter()}}.
> This may slowdown the system, but look like a correct fix. So, some 
> optimisation may be required.
> Please make sure that counters are correct after the fix using idle_verify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to