[ https://issues.apache.org/jira/browse/IGNITE-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303293#comment-16303293 ]
Stanislav Lukyanov edited comment on IGNITE-7264 at 12/25/17 2:39 PM: ---------------------------------------------------------------------- There seem to be several flavors of this problem, but the one I can more or les consistently reproduce is 1) Create a cache with a '/' in the name on a cluster with persistence enabled 2) Kill the cluster without deactivation The key is to kill the cluster with records left in WAL that are not applied to the persistence storage. Then the activation fails to find the cache, but tries to apply WAL records related to that cache - and fails. 3) Restart the cluster and activate it Then the following stack trace appears with NPE killing the ExchangeWorker thread but not the node java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:1781) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:1641) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1074) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:865) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1035) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:649) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) The question is whether there should be more graceful error handling in the case when WAL contains records for non-existing caches. Should exchange worker be prepared for an unchecked exception during a task execution? If not, should it trigger node shutdown if it exits with an unhandled exception? Should WAL be allowed to contain a record for an unknown cache? Should such records be ignored? At the very least, an `assert cacheCtx != null` in the `applyLastUpdates` would be helpful. This (potential) problem is not the root cause of the issue though. was (Author: slukyanov): There seem to be several flavors of this problem, but the one I can more or les consistently reproduce is 1) Create a cache with a '/' in the name on a cluster with persistence enabled 2) Kill the cluster without deactivation The key is to kill the cluster with records left in WAL that are not applied to the persistence storage. Then the activation fails to find the cache, but tries to apply WAL records related to that cache - and fails. 3) Restart the cluster and activate it Then the following stack trace appears with NPE killing the ExchangeWorker thread but not the node java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:1781) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:1641) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1074) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:865) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1035) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:649) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) The question is whether there should be more graceful error handling in the case when WAL contains records for non-existing caches. Should exchange worker be prepared for an unchecked exception during a task execution? If not, should it trigger node shutdown if it exits with an unhandled exception? Should WAL be allowed to contain a record for an unknown cache? Should such records be ignored? At the very least, an `assert cacheCtx != null` in the `applyLastUpdates` would be helpful. > Caches with forward slash "/" in names cause problems for PDS > ------------------------------------------------------------- > > Key: IGNITE-7264 > URL: https://issues.apache.org/jira/browse/IGNITE-7264 > Project: Ignite > Issue Type: Bug > Components: cache, persistence > Affects Versions: 2.3 > Reporter: Ilya Kasnacheev > Assignee: Stanislav Lukyanov > > If I am to create cache with name "caches/1", there's no immediate error, but > nodes fail when trying to rejoin topology with storage already initialized. > I think there should be an immediate exception in case persistence is enabled > for such case. > Moreover, I suggest first trying to create directory, then making sure it was > created and that dir.parent == expected parent directory. Because on Windows > there are more restrictions on FS file names, etc... -- This message was sent by Atlassian JIRA (v6.4.14#64029)