[
https://issues.apache.org/jira/browse/IGNITE-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823077#comment-16823077
]
Vyacheslav Koptilin commented on IGNITE-11699:
----------------------------------------------
The root cause of the mentioned behavior is that {{SegmentRouter}} is not aware
of the mode when WAL archive is disabled.
under some circumstances, it leads to that the
{{GridCacheDatabaseSharedManager#walTail}} may point to incorrect WAL segment
after physical/logical restore and therefore it may break node restart
procedure.
I have to mention the following issues as well:
- first of all, it does not seem correct the handling of exceptions when a
node applies physical/logical updates. For now, it ignores all runtime
exceptions.
- constructors of WALRecord (for example, InitNewPageRecord) may throw
AssertionError that may break the physical/logical restore of a node.
The corrupted entry/record may trigger assertion error that cannot be properly
handled. So, I think we should avoid that or explicitly throw
IgniteCheckedException instead of IgniteException, AssertionError, etc.
> Node can't start after forced shutdown if the wal archiver disabled
> -------------------------------------------------------------------
>
> Key: IGNITE-11699
> URL: https://issues.apache.org/jira/browse/IGNITE-11699
> Project: Ignite
> Issue Type: Bug
> Components: persistence
> Affects Versions: 2.7
> Reporter: Pavel Vinokurov
> Assignee: Vyacheslav Koptilin
> Priority: Major
> Attachments: disabled-wal-archive-reproducer.zip
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> If a server node killed with the disabled wal archive, it could fail on start
> with following exception:
> {code:java}
> [18:37:53,887][SEVERE][sys-stripe-1-#2][G] Failed to execute runnable.
> java.lang.IllegalStateException: Failed to get page IO instance (page content
> is corrupted)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:85)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:97)
> at
> org.apache.ignite.internal.pagemem.wal.record.delta.MetaPageUpdatePartitionDataRecord.applyDelta(MetaPageUpdatePartitionDataRecord.java:109)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyPageDelta(GridCacheDatabaseSharedManager.java:2532)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$11(GridCacheDatabaseSharedManager.java:2327)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApplyPage$12(GridCacheDatabaseSharedManager.java:2441)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApply$13(GridCacheDatabaseSharedManager.java:2479)
> at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:550)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The reproducer is attached(works only on Linux).
> Steps to run the reproducer.
> 1. Copy config/server.xml into IGNITE_HOME/config folder;
> 2. Set IGNITE_HOME in the CorruptionReproducer class;
> 3. Launch CorruptionReproducer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)