[ 
https://issues.apache.org/jira/browse/IGNITE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438960#comment-16438960
 ] 

Alexander Belyak commented on IGNITE-8119:
------------------------------------------

It shouldn't, but if PDS broken only in one node, Ignite should correctly 
operate on another nodes in cluster. 
And it will be great if Ignite pass some meaningful error messages if some 
significant part of PDS is absence or outdated (copied from different backups, 
removed or partially restored). It can be acquired by simple add some mark file 
into every significant part of PDS: db, wal, wal_archive, dbinary_meta, 
marshaller,d snapshot folders.
On close we can:
1) write pds_<new_uuid>.log file with "timestamp" and "prev_uuid" into each 
significant folder
2) remove old pds_<prev_uuid>.log files from each significant folder
On start we can:
1) read all pds_<uuid>.log files from each significant folder
2) compare uuid from all folders and if some uuid is differ (with considering 
of new/prev possibilityd, if node crashes on write) - write smth like "Can't 
start with WAL_ARCHIVE <path> with timestamp "timestamp", because DB <path> 
have timestam <timestamp>" or "Can't start without WAL_ARCHIVE <path>, because 
DB <path> has been populated in <timestamp>". 

> NPE on clear DB and unclear WAL/WAL_ARCHIVE
> -------------------------------------------
>
>                 Key: IGNITE-8119
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8119
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 2.4
>            Reporter: Alexander Belyak
>            Priority: Major
>         Attachments: ClearTest.java, ClearTestP.java
>
>
> 1) Start grid (1 node will be enought), activate it and populate some data
> 2) Stop node and clear db folder
> 3) Start grid and activate it
> Expected result:
> Error about inconsistent storage configuration with/without start node with 
> such store
> Actual result:
> Exchange-worker on node stop with NPE, this can hang whole cluster from 
> complete any PME operations.
> {noformat}
> Failed to reinitialize local partitions (preloading will be stopped): 
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, 
> minorTopVer=1], ...
> java.lang.NullPointerException
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:2354)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2099)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1325)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1113)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1063)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:661)
>       at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2329)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to