[
https://issues.apache.org/jira/browse/IGNITE-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303293#comment-16303293
]
Stanislav Lukyanov commented on IGNITE-7264:
--------------------------------------------
There seem to be several flavors of this problem, but the one I can more or les
consistently reproduce is
1) Create a cache with a '/' in the name on a cluster with persistence enabled
2) Kill the cluster without deactivation
The key is to kill the cluster with records left in WAL that are not
applied to the persistence storage. Then the activation fails to find the
cache, but tries to apply WAL records related to that cache - and fails.
3) Restart the cluster and activate it
Then the following stack trace appears with NPE killing the ExchangeWorker
thread but not the node
java.lang.NullPointerException
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:1781)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:1641)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1074)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:865)
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1035)
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:649)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
The question is whether there should be more graceful error handling in the
case when WAL contains records for non-existing caches.
Should exchange worker be prepared for an unchecked exception during a task
execution? If not, should it trigger node shutdown if it exits with an
unhandled exception?
Should WAL be allowed to contain a record for an unknown cache? Should such
records be ignored?
At the very least, an `assert cacheCtx != null` in the `applyLastUpdates` would
be helpful.
> Caches with forward slash "/" in names cause problems for PDS
> -------------------------------------------------------------
>
> Key: IGNITE-7264
> URL: https://issues.apache.org/jira/browse/IGNITE-7264
> Project: Ignite
> Issue Type: Bug
> Components: cache, persistence
> Affects Versions: 2.3
> Reporter: Ilya Kasnacheev
> Assignee: Stanislav Lukyanov
>
> If I am to create cache with name "caches/1", there's no immediate error, but
> nodes fail when trying to rejoin topology with storage already initialized.
> I think there should be an immediate exception in case persistence is enabled
> for such case.
> Moreover, I suggest first trying to create directory, then making sure it was
> created and that dir.parent == expected parent directory. Because on Windows
> there are more restrictions on FS file names, etc...
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)