[
https://issues.apache.org/jira/browse/IGNITE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952729#comment-16952729
]
Matija Polajnar commented on IGNITE-10226:
------------------------------------------
On development environments (for now, luckily) we sometimes get errors like
this one:
{code:java}
...
Caused by: javax.cache.CacheException: class
org.apache.ignite.cluster.ClusterTopologyException: Cannot run update query.
Node must own all the necessary partitions.
at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
~[ignite-core-2.7.0.jar:2.7.0]
at
com.marand.thinkehr.tasks.common.ignite.IgniteCompletableFuture.lambda$new$2ae3f52e$1(IgniteCompletableFuture.java:25)
~[classes/:?]
at
org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:215)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:179)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.util.future.IgniteFutureImpl.listen(IgniteFutureImpl.java:71)
~[ignite-core-2.7.0.jar:2.7.0]
...
Caused by: org.apache.ignite.cluster.ClusterTopologyException: Cannot run
update query. Node must own all the necessary partitions.
at
org.apache.ignite.internal.util.IgniteUtils$7.apply(IgniteUtils.java:888)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.util.IgniteUtils$7.apply(IgniteUtils.java:886)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
~[ignite-core-2.7.0.jar:2.7.0]
...
Caused by: org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Cannot run update query. Node must own all the necessary partitions.
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxAbstractEnlistFuture.checkPartitions(GridDhtTxAbstractEnlistFuture.java:922)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxAbstractEnlistFuture.init(GridDhtTxAbstractEnlistFuture.java:336)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxEnlistFuture.enlistLocal(GridNearTxEnlistFuture.java:518)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxEnlistFuture.sendBatch(GridNearTxEnlistFuture.java:413)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxEnlistFuture.sendNextBatches(GridNearTxEnlistFuture.java:168)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxEnlistFuture.map(GridNearTxEnlistFuture.java:144)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxAbstractEnlistFuture.init(GridNearTxAbstractEnlistFuture.java:241)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.updateAsync(GridNearTxLocal.java:2099)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.mvccRemoveAllAsync0(GridNearTxLocal.java:1976)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.removeAllAsync0(GridNearTxLocal.java:1689)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.removeAllAsync(GridNearTxLocal.java:554)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter$40.op(GridCacheAdapter.java:3174)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter$AsyncOp.op(GridCacheAdapter.java:5288)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOp(GridCacheAdapter.java:4450)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOp(GridCacheAdapter.java:4345)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAllAsync0(GridCacheAdapter.java:3172)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAllAsync(GridCacheAdapter.java:3159)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.removeAllAsync(IgniteCacheProxyImpl.java:1342)
~[ignite-core-2.7.0.jar:2.7.0]
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.removeAllAsync(GatewayProtectedCacheProxy.java:1072)
~[ignite-core-2.7.0.jar:2.7.0]
... {code}
Given that we use Ignite embedded into java application, it probably gets shut
down uncleanly a lot in development. This is typically a single-node machine.
Backup count is set to 1, but there is only one node anyway (so I'm not sure
why partition would be MOVING any time anyway).
I set a breakpoint in GridDhtTxAbstractEnlistFuture.checkPartitions and found
the offending partitions had a status of MOVING.
I suspect this might also be the cause for sometimes IgniteCache.get(x) and
IgniteCache.containsKey(x) returning null and false respectively despite the
cache certainly containing the key x with a non-null value (i.e.
cache.containsKey(cache.iterator().next().getKey()) returns false).
resetLostPartitions probably has no effect in this case?
> Partition may restore wrong MOVING state during crash recovery
> --------------------------------------------------------------
>
> Key: IGNITE-10226
> URL: https://issues.apache.org/jira/browse/IGNITE-10226
> Project: Ignite
> Issue Type: Bug
> Components: cache
> Affects Versions: 2.4
> Reporter: Pavel Kovalenko
> Assignee: Pavel Kovalenko
> Priority: Major
> Fix For: 2.8
>
>
> The way to get it exists only in versions that don't have IGNITE-9420:
> 1) Start cache, upload some data to partitions, forceCheckpoint
> 2) Start uploading additional data. Kill node. Node should be killed with
> skipping last checkpoint, or during checkpoint mark phase.
> 3) Re-start node. The crash recovery process for partitions started. When we
> create partition during crash recovery (topology().forceCreatePartition()) we
> log it's initial state to WAL. If we have any logical update relates to
> partition we'll log wrong MOVING state to the end of current WAL. This state
> will be considered as last valid when we process PartitionMetaStateRecord
> record's during logical recovery. In "restorePartitionsState" phase this
> state will be chosen as final and the partition will change to MOVING, even
> in page memory it has OWNING or something else.
> To fix this problem in 2.4 - 2.7 versions, additional logging partition state
> change to WAL during crash recovery (logical recovery) should be removed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)