[ 
https://issues.apache.org/jira/browse/IGNITE-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262787#comment-17262787
 ] 

Ilya Kasnacheev commented on IGNITE-13976:
------------------------------------------

[~av] can you please review the PR, hint at what is the root cause of the 
problem? The main reproducer is WalDisableTest (a main() runnable program).

I have devised a sequential test (see the PR) and fixed it with some hack, but 
it did not fix the root cause. I have also tried to patch in another place, but 
it did not fix the reproducer.

I don't understand why on a fresh node there's 
org.apache.ignite.internal.processors.cache.CacheGroupDescriptor#walChangeReqs 
if WAL change is PME and is purely sequential. I also don't understand why the 
WAL status is not propagated from cache information to cache group.

> WAL disable/enable with node restarts results in mismatching state, data loss
> -----------------------------------------------------------------------------
>
>                 Key: IGNITE-13976
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13976
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.9.1
>            Reporter: Ilya Kasnacheev
>            Assignee: Ilya Kasnacheev
>            Priority: Major
>
> If you try to enable/disable WAL on unstable topology, you will get to state 
> when WAL status is undefined, nodes might have different wall status and the 
> only way to fix it is to restart the cluster, which will lead to data loss 
> because ignite removes data if WAL is disabled on restart.
> See the reproducer in PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to