[ 
https://issues.apache.org/jira/browse/IGNITE-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823077#comment-16823077
 ] 

Vyacheslav Koptilin commented on IGNITE-11699:
----------------------------------------------

The root cause of the mentioned behavior is that {{SegmentRouter}} is not aware 
of the mode when WAL archive is disabled.
 under some circumstances, it leads to that the 
{{GridCacheDatabaseSharedManager#walTail}} may point to incorrect WAL segment 
after physical/logical restore and therefore it may break node restart 
procedure.

I have to mention the following issues as well:
 - first of all, it does not seem correct the handling of exceptions when a 
node applies physical/logical updates. For now, it ignores all runtime 
exceptions.
 - constructors of WALRecord (for example, InitNewPageRecord) may throw 
AssertionError that may break the physical/logical restore of a node.
 The corrupted entry/record may trigger assertion error that cannot be properly 
handled. So, I think we should avoid that or explicitly throw 
IgniteCheckedException instead of IgniteException, AssertionError, etc.

> Node can't start after forced shutdown if the wal archiver disabled
> -------------------------------------------------------------------
>
>                 Key: IGNITE-11699
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11699
>             Project: Ignite
>          Issue Type: Bug
>          Components: persistence
>    Affects Versions: 2.7
>            Reporter: Pavel Vinokurov
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>         Attachments: disabled-wal-archive-reproducer.zip
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a server node killed with the disabled wal archive, it could fail on start 
> with following exception:
> {code:java}
> [18:37:53,887][SEVERE][sys-stripe-1-#2][G] Failed to execute runnable.
> java.lang.IllegalStateException: Failed to get page IO instance (page content 
> is corrupted)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:85)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:97)
>       at 
> org.apache.ignite.internal.pagemem.wal.record.delta.MetaPageUpdatePartitionDataRecord.applyDelta(MetaPageUpdatePartitionDataRecord.java:109)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyPageDelta(GridCacheDatabaseSharedManager.java:2532)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$11(GridCacheDatabaseSharedManager.java:2327)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApplyPage$12(GridCacheDatabaseSharedManager.java:2441)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApply$13(GridCacheDatabaseSharedManager.java:2479)
>       at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:550)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> The reproducer is attached(works only on Linux).
> Steps to run the reproducer.
> 1. Copy config/server.xml into IGNITE_HOME/config folder;
> 2. Set IGNITE_HOME in the CorruptionReproducer class;
> 3. Launch  CorruptionReproducer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to