[jira] [Comment Edited] (IGNITE-7264) Caches with forward slash "/" in names cause problems for PDS

Stanislav Lukyanov (JIRA) Mon, 25 Dec 2017 06:40:57 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303293#comment-16303293
 ]


Stanislav Lukyanov edited comment on IGNITE-7264 at 12/25/17 2:39 PM:
----------------------------------------------------------------------

There seem to be several flavors of this problem, but the one I can more or les 
consistently reproduce is
1) Create a cache with a '/' in the name on a cluster with persistence enabled
2) Kill the cluster without deactivation
    The key is to kill the cluster with records left in WAL that are not 
applied to the persistence storage. Then the activation fails to find the 
cache, but tries to apply WAL records related to that cache - and fails.
3) Restart the cluster and activate it
Then the following stack trace appears with NPE killing the ExchangeWorker 
thread but not the node
    java.lang.NullPointerException
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:1781)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:1641)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1074)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:865)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1035)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:649)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:748)

The question is whether there should be more graceful error handling in the 
case when WAL contains records for non-existing caches.
Should exchange worker be prepared for an unchecked exception during a task 
execution? If not, should it trigger node shutdown if it exits with an 
unhandled exception?
Should WAL be allowed to contain a record for an unknown cache? Should such 
records be ignored? 
At the very least, an `assert cacheCtx != null` in the `applyLastUpdates` would 
be helpful.

This (potential) problem is not the root cause of the issue though.


was (Author: slukyanov):
There seem to be several flavors of this problem, but the one I can more or les 
consistently reproduce is
1) Create a cache with a '/' in the name on a cluster with persistence enabled
2) Kill the cluster without deactivation
    The key is to kill the cluster with records left in WAL that are not 
applied to the persistence storage. Then the activation fails to find the 
cache, but tries to apply WAL records related to that cache - and fails.
3) Restart the cluster and activate it
Then the following stack trace appears with NPE killing the ExchangeWorker 
thread but not the node
    java.lang.NullPointerException
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:1781)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:1641)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1074)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:865)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1035)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:649)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:748)

The question is whether there should be more graceful error handling in the 
case when WAL contains records for non-existing caches.
Should exchange worker be prepared for an unchecked exception during a task 
execution? If not, should it trigger node shutdown if it exits with an 
unhandled exception?
Should WAL be allowed to contain a record for an unknown cache? Should such 
records be ignored? 
At the very least, an `assert cacheCtx != null` in the `applyLastUpdates` would 
be helpful.

> Caches with forward slash "/" in names cause problems for PDS
> -------------------------------------------------------------
>
>                 Key: IGNITE-7264
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7264
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, persistence
>    Affects Versions: 2.3
>            Reporter: Ilya Kasnacheev
>            Assignee: Stanislav Lukyanov
>
> If I am to create cache with name "caches/1", there's no immediate error, but 
> nodes fail when trying to rejoin topology with storage already initialized.
> I think there should be an immediate exception in case persistence is enabled 
> for such case.
> Moreover, I suggest first trying to create directory, then making sure it was 
> created and that dir.parent == expected parent directory. Because on Windows 
> there are more restrictions on FS file names, etc...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (IGNITE-7264) Caches with forward slash "/" in names cause problems for PDS

Reply via email to