On Tue, Sep 25, 2012 at 10:53 AM, Manik Surtani <[email protected]> wrote:
> > On 24 Sep 2012, at 16:22, Dan Berindei <[email protected]> wrote: > > Hi guys > > During the final push for NBST I found a bug with preloading (entries that > didn't belong on a joiner weren't removed after the initial state > transfer). I decided to fix it and > https://issues.jboss.org/browse/ISPN-1586 at the same time, since it was > a longstanding bug and I had a reasonable idea on what to do. However, I > missed some implications and I need to fix them - there is at least one > Query test failing because of my change (SharedCacheLoaderQueryIndexTest). > > In 5.1, preloading worked like this: > 1. Start the CacheLoaderManager, which preloads everything from the cache > store in memory. > 2. Start the StateTransferManager, retrieving data from the other cache > members and overwriting already-preloaded values. > 3. When the initial state transfer ends, entries not owned by the local > node are deleted. > > The main issue with this, raised in ISPN-1586, is that entries that were > deleted on the other cache members are "revived" on the joiner when it > reads the data from the cache store. There is another performance issue, > because we load a lot of data that we then discard, but that's less > important. > > With the ISPN-1586 fix, preloading should work like this: > 1. Start the StateTransferManager, receive initial CH. > 2. If the local node is not the first to start up, fetching state (either > in-memory or persistent) is enabled and the cache store is non-shared, > clear it. > 3. Start the CacheLoaderManager, which preloads the cache store in memory > - but only if the local node is the first one having started the cache OR > if the fetching state is disabled. > 4. Run the initial state transfer, retrieving data from the other cache > members (if any, and if fetching state is enabled). > > This solves ISPN-1586, but it does mean that data from non-shared cache > stores will be lost on all the nodes except the first that starts up. So if > the last node to shut down is not the first node to start back up, the > cluster will lose data. > > These are the alternatives I'm considering: > a) Finish the ISPN-1586 fix and clearly document that non-shared cache > stores don't guarantee persistence after cluster restart (unless the last > cache to stop is the first to start back up and shutdown was spaced out to > allow state transfer to move everything to the last node). > b) Revert my ISPN-1586 fix and allow "zombie" cache entries on the joiners > (leaving ISPN-1586 open). > > > Maybe another approach could be: > > 1. Start the STM, retrieve initial CH > 2. If the local node… (as above) … is non-shared, *don't clear it*, but > mark the node so preloading is *deferred*. > 3. Start the CLM … skip preload if we mark it as deferred, in step 2. > 4. Run initial state transfer. This will write newer versions of entries > to the cache store if needed. > 5. Now, if preloading has been deferred in step 2, start a preload, if > we're configured to do any preloading. > > This should give us consistency. > > Nope, this doesn't solve ISPN-1586: if the already-running members have deleted a key, the deferred preload on the joiner can still load that key from its cache store. In fact, the preload doesn't even matter here: just the fact that the key is still in the cache store means that the node can still return a non-null value for a deleted key. This is why I added the clear step in my algorithm: to avoid resurrecting removed keys without receiving any tombstones through state transfer. > > I think there may be a third option: > c) Make preload a JMX operation and allow the user to run a cluster-wide > preload once all the nodes in the cluster have started up. But this looks a > little complicated, and it would require either versioning or prohibiting > external cache writes until the cluster-wide preload is done to ensure > consistency. > > > I'm not sure how having this as a JMX option helps. Having versioning, > etc. solves the problem even with an automatic preload. > > Agree, just having this as an option in JMX doesn't fix anything. But having it as a manual operation would allow us to assume (and document it this way) that the admin only exposes the cluster to the clients after preloading is done - so we'd have no concurrent changes to worry about. > What do you guys think? Sanne, I'm particularly interested how you think > option a) would fit with the query module. > > Cheers > Dan > > >
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
