[jira] [Updated] (IGNITE-12375) Inconsistent persistent cache behaviour: containsKey returns false on a key returned by iterator

Matija Polajnar (Jira) Fri, 15 Nov 2019 03:55:03 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matija Polajnar updated IGNITE-12375:
-------------------------------------
    Description: 
On a fairly complex spring boot application using embedded Ignite persistent 
storage, we've managed (multiple times) to get into a situation where some 
persistent caches start behaving weirdly. The symptoms are such: caches' 
{{iterator()}} method returns the elements we previously put into caches as 
expected. {{size()}} also returns the expected value. But {{containsKey(...)}} 
and {{get(...)}} return {{false}} and {{null}} respectively for some (or all) 
keys that are expected to be in the cache and are even returned by the 
{{iterator()}}.

The problem never starts occurring mid-run, but always after cluster restarts; 
not at all always, and we suspect a necessary precondition is that cache 
configurations are slightly changed, like having modified QueryEntities and 
such. We also suspect this only happens on single-node clusters, so it might be 
related to IGNITE-12297, but the workaround that works for that problem does 
not fix the problem described here.

The caches in question then cannot be repaired short of destroying and 
re-creating them and re-importing data.

 

We tried and failed to reproduce the problem from scratch in a small demo 
application. We managed, however, to grab a {{work}} directory from our 
application after corruption and then create a demo application with a minimal 
set of classes needed to demonstrate the issue on reading (after corruption is 
already present).

I'm attaching a zip file with the code (along with a maven pom.xml) and the 
corrupted work directory. You can directly execute the demo by issuing {{mvn 
compile exec:java}}, which will execute the 
{{care.better.demo.ignitebug.BugApp}} class. In this class there's this method:

{code:java}
    private static void replicateProblem(IgniteCache<Object, Object> cache) {
        int seen = 0;
        Iterator<Cache.Entry<Object, Object>> entryIterator = cache.iterator();
        while (entryIterator.hasNext()) {
            Object key = entryIterator.next().getKey();
            if (!cache.containsKey(key) || cache.get(key) == null) {
                LOG.error("UNSEEN KEY: {}", key);
            } else {
                seen++;
            }
        }
        LOG.info("Size {}, seen {}.", cache.size(), seen);
    }
{code}
 
After execution you will note log records like this one: ERROR 
care.better.demo.ignitebug.BugApp.replicateProblem - UNSEEN KEY: 
QueueKey{affinityKey=PartyIdArg{namespace='ЭМИАС Медработники', id='222'}, 
entryId=c059b587-78d3-4c75-b64f-8575ae3d2318}

We had no success in trying to find any lead while debugging through Ignite 
source code so we kindly ask your assistance in hunting down this bug and, 
until it is fixed, suggesting any possible work-around should this occur in a 
production environment (it has not so far) where it is not practical to dump 
all data from some cache into a file to be able to destroy, re-create and 
re-import it.

  was:
On a fairly complex spring boot application using embedded Ignite persistent 
storage, we've managed (multiple times) to get into a situation where some 
persistent caches start behaving weirdly. The symptoms are such: caches' 
{{iterator()}} method returns the elements we previously put into caches as 
expected. {{size()}} also returns the expected value. But {{containsKey(x)}} 
and {{get(x)}} return {{false}} and {{null}} respectively for some (or all) 
keys that are expected to be in the cache and are even returned by the 
{{iterator()}}.

The problem never starts occurring mid-run, but always after cluster restarts; 
not at all always, and we suspect a necessary precondition is that cache 
configurations are slightly changed, like having modified QueryEntities and 
such. We also suspect this only happens on single-node clusters, so it might be 
related to IGNITE-12297, but the workaround that works for that problem does 
not fix the problem described here.

The caches in question then cannot be repaired short of destroying and 
re-creating them and re-importing data.

 

We tried and failed to reproduce the problem from scratch in a small demo 
application. We managed, however, to grab a {{work}} directory from our 
application after corruption and then create a demo application with a minimal 
set of classes needed to demonstrate the issue on reading (after corruption is 
already present).

I'm attaching a zip file with the code (along with a maven pom.xml) and the 
corrupted work directory. You can directly execute the demo by issuing {{mvn 
compile exec:java}}, which will execute the 
{{care.better.demo.ignitebug.BugApp}} class. In this class there's this method:

{code:java}
    private static void replicateProblem(IgniteCache<Object, Object> cache) {
        int seen = 0;
        Iterator<Cache.Entry<Object, Object>> entryIterator = cache.iterator();
        while (entryIterator.hasNext()) {
            Object key = entryIterator.next().getKey();
            if (!cache.containsKey(key) || cache.get(key) == null) {
                LOG.error("UNSEEN KEY: {}", key);
            } else {
                seen++;
            }
        }
        LOG.info("Size {}, seen {}.", cache.size(), seen);
    }
{code}
 
After execution you will note log records like this one: ERROR 
care.better.demo.ignitebug.BugApp.replicateProblem - UNSEEN KEY: 
QueueKey{affinityKey=PartyIdArg{namespace='ЭМИАС Медработники', id='222'}, 
entryId=c059b587-78d3-4c75-b64f-8575ae3d2318}

We had no success in trying to find any lead while debugging through Ignite 
source code so we kindly ask your assistance in hunting down this bug and, 
until it is fixed, suggesting any possible work-around should this occur in a 
production environment (it has not so far) where it is not practical to dump 
all data from some cache into a file to be able to destroy, re-create and 
re-import it.


> Inconsistent persistent cache behaviour: containsKey returns false on a key 
> returned by iterator
> ------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-12375
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12375
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.7, 2.7.6
>            Reporter: Matija Polajnar
>            Priority: Major
>         Attachments: ignite-bug.zip
>
>
> On a fairly complex spring boot application using embedded Ignite persistent 
> storage, we've managed (multiple times) to get into a situation where some 
> persistent caches start behaving weirdly. The symptoms are such: caches' 
> {{iterator()}} method returns the elements we previously put into caches as 
> expected. {{size()}} also returns the expected value. But 
> {{containsKey(...)}} and {{get(...)}} return {{false}} and {{null}} 
> respectively for some (or all) keys that are expected to be in the cache and 
> are even returned by the {{iterator()}}.
> The problem never starts occurring mid-run, but always after cluster 
> restarts; not at all always, and we suspect a necessary precondition is that 
> cache configurations are slightly changed, like having modified QueryEntities 
> and such. We also suspect this only happens on single-node clusters, so it 
> might be related to IGNITE-12297, but the workaround that works for that 
> problem does not fix the problem described here.
> The caches in question then cannot be repaired short of destroying and 
> re-creating them and re-importing data.
>  
> We tried and failed to reproduce the problem from scratch in a small demo 
> application. We managed, however, to grab a {{work}} directory from our 
> application after corruption and then create a demo application with a 
> minimal set of classes needed to demonstrate the issue on reading (after 
> corruption is already present).
> I'm attaching a zip file with the code (along with a maven pom.xml) and the 
> corrupted work directory. You can directly execute the demo by issuing {{mvn 
> compile exec:java}}, which will execute the 
> {{care.better.demo.ignitebug.BugApp}} class. In this class there's this 
> method:
> {code:java}
>     private static void replicateProblem(IgniteCache<Object, Object> cache) {
>         int seen = 0;
>         Iterator<Cache.Entry<Object, Object>> entryIterator = 
> cache.iterator();
>         while (entryIterator.hasNext()) {
>             Object key = entryIterator.next().getKey();
>             if (!cache.containsKey(key) || cache.get(key) == null) {
>                 LOG.error("UNSEEN KEY: {}", key);
>             } else {
>                 seen++;
>             }
>         }
>         LOG.info("Size {}, seen {}.", cache.size(), seen);
>     }
> {code}
>  
> After execution you will note log records like this one: ERROR 
> care.better.demo.ignitebug.BugApp.replicateProblem - UNSEEN KEY: 
> QueueKey{affinityKey=PartyIdArg{namespace='ЭМИАС Медработники', id='222'}, 
> entryId=c059b587-78d3-4c75-b64f-8575ae3d2318}
> We had no success in trying to find any lead while debugging through Ignite 
> source code so we kindly ask your assistance in hunting down this bug and, 
> until it is fixed, suggesting any possible work-around should this occur in a 
> production environment (it has not so far) where it is not practical to dump 
> all data from some cache into a file to be able to destroy, re-create and 
> re-import it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IGNITE-12375) Inconsistent persistent cache behaviour: containsKey returns false on a key returned by iterator

Reply via email to