On 26 Sep 2012, at 12:44, Dan Berindei wrote:

> On Tue, Sep 25, 2012 at 10:53 AM, Manik Surtani <[email protected]> wrote:
> 
> On 24 Sep 2012, at 16:22, Dan Berindei <[email protected]> wrote:
> 
>> Hi guys
>> 
>> During the final push for NBST I found a bug with preloading (entries that 
>> didn't belong on a joiner weren't removed after the initial state transfer). 
>> I decided to fix it and https://issues.jboss.org/browse/ISPN-1586 at the 
>> same time, since it was a longstanding bug and I had a reasonable idea on 
>> what to do. However, I missed some implications and I need to fix them - 
>> there is at least one Query test failing because of my change 
>> (SharedCacheLoaderQueryIndexTest).
>> 
>> In 5.1, preloading worked like this:
>> 1. Start the CacheLoaderManager, which preloads everything from the cache 
>> store in memory.
>> 2. Start the StateTransferManager, retrieving data from the other cache 
>> members and overwriting already-preloaded values.
>> 3. When the initial state transfer ends, entries not owned by the local node 
>> are deleted.
>> 
>> The main issue with this, raised in ISPN-1586, is that entries that were 
>> deleted on the other cache members are "revived" on the joiner when it reads 
>> the data from the cache store. There is another performance issue, because 
>> we load a lot of data that we then discard, but that's less important.
>> 
>> With the ISPN-1586 fix, preloading should work like this:
>> 1. Start the StateTransferManager, receive initial CH.
>> 2. If the local node is not the first to start up, fetching state (either 
>> in-memory or persistent) is enabled and the cache store is non-shared, clear 
>> it.
>> 3. Start the CacheLoaderManager, which preloads the cache store in memory - 
>> but only if the local node is the first one having started the cache OR if 
>> the fetching state is disabled.
>> 4. Run the initial state transfer, retrieving data from the other cache 
>> members (if any, and if fetching state is enabled).
>> 
>> This solves ISPN-1586, but it does mean that data from non-shared cache 
>> stores will be lost on all the nodes except the first that starts up. So if 
>> the last node to shut down is not the first node to start back up, the 
>> cluster will lose data.
>> 
>> These are the alternatives I'm considering:
>> a) Finish the ISPN-1586 fix and clearly document that non-shared cache 
>> stores don't guarantee persistence after cluster restart (unless the last 
>> cache to stop is the first to start back up and shutdown was spaced out to 
>> allow state transfer to move everything to the last node).
>> b) Revert my ISPN-1586 fix and allow "zombie" cache entries on the joiners 
>> (leaving ISPN-1586 open).
> 
> Maybe another approach could be:
> 
> 1. Start the STM, retrieve initial CH
> 2. If the local node… (as above) … is non-shared, *don't clear it*, but mark 
> the node so preloading is *deferred*.
> 3. Start the CLM … skip preload if we mark it as deferred, in step 2.
> 4. Run initial state transfer.  This will write newer versions of entries to 
> the cache store if needed.
> 5. Now, if preloading has been deferred in step 2, start a preload, if we're 
> configured to do any preloading.
> 
> This should give us consistency.
> 
> 
> Nope, this doesn't solve ISPN-1586: if the already-running members have 
> deleted a key, the deferred preload on the joiner can still load that key 
> from its cache store. In fact, the preload doesn't even matter here: just the 
> fact that the key is still in the cache store means that the node can still 
> return a non-null value for a deleted key.

The proper solution for this problem is based on versioning. Till that point 
the non-null value issue is a consistency issue and we shouldn't allow that to 
happen.  
+1 for a)  and a proper fix based on versioning. 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)




_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to