[ 
https://issues.apache.org/jira/browse/IGNITE-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210785#comment-17210785
 ] 

Anton Kalashnikov commented on IGNITE-13565:
--------------------------------------------

In my opinion, it is not a potential bug, it is already a bug. It looks like if 
DurableBackgroundTask is finished but status isn't updated it metastore, it 
leads to data corruption but finishing DurableBackgroundTask and changing 
status in metastore is not atomic operation so nobody can guarantee that node 
doesn't fail between these two actions. Perhaps, It needs to add some atomic 
operation for detection of finish the DurableBackgroundTask(maybe we should 
write something in WAL).

> Potential further bugs with DurableBackgroundTasks.
> ---------------------------------------------------
>
>                 Key: IGNITE-13565
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13565
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 2.8.1
>            Reporter: Stanilovsky Evgeny
>            Priority: Major
>
> After some code refactoring [1] we obtain a problem with simpe test: 
> org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testInlineSizeChange
> between 
> {noformat}
> execSql(cache, "drop index \"idx1\"");
> {noformat}
> and
> {noformat}
> ig0 = startGrid(0);
> {noformat}
> operations, seems [2] will fix it, but problem could potentially happen again 
> (check attached stacks). In few words already completed durable task not 
> updated 
> {noformat}
> DurableBackgroundTask#complete
> {noformat}
> status on metastore, thus after cluster running this task still can try to 
> run once more with undefined behavior. [~Denis Chudov], [~makedonskaya] pay 
> your attention plz.
> [1] https://issues.apache.org/jira/browse/IGNITE-13207
> [2] https://issues.apache.org/jira/browse/IGNITE-13500
> {noformat}
> 2020-10-09 11:42:41,982][INFO ][test-runner-#1%index.BasicIndexTest%][root] 
> >>> Stopping grid [name=index.BasicIndexTest0, 
> id=161e62a2-1a5d-46b0-892d-2e0274e00000]
> [2020-10-09 
> 11:42:41,999][ERROR][db-checkpoint-thread-#61%index.BasicIndexTest0%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform 
> cache update: node is stopping.]]
> class org.apache.ignite.IgniteException: Failed to perform cache update: node 
> is stopping.
>       at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:125)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1297)
>       at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245)
>       at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>       at java.lang.Thread.run(Thread.java:748)
> ...
> starting grid and ...
> java.lang.AssertionError: calculatedOffset=49152, allocated=45056, 
> headerSize=4096, 
> cfgFile=/work/repo/apache-ignite/work/db/index_BasicIndexTest0/cache-default/index.bin
> >>> +-------------------------------------------+
> >>> Ignite ver. 2.10.0-SNAPSHOT#20201009-sha1:DEV
> >>> +-------------------------------------------+
>       at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:492)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:554)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:538)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:884)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:710)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:699)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:158)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.acquirePage(BPlusTree.java:6037)
>       at 
> org.apache.ignite.internal.processors.query.h2.database.H2Tree.getMetaInfo(H2Tree.java:415)
>       at 
> org.apache.ignite.internal.processors.query.h2.database.H2Tree.<init>(H2Tree.java:241)
>       at 
> org.apache.ignite.internal.processors.query.h2.DurableBackgroundCleanupIndexTreeTask.execute(DurableBackgroundCleanupIndexTreeTask.java:140)
>       at 
> org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor$1.body(DurableBackgroundTasksProcessor.java:99)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to