On 7 October 2013 10:44, Dan Berindei <[email protected]> wrote: > > > > On Mon, Oct 7, 2013 at 2:30 AM, Sanne Grinovero <[email protected]> > wrote: >> >> On 6 October 2013 00:01, Pedro Ruivo <[email protected]> wrote: >> > Hi Sanne. >> > >> > Thanks for your comments. please see inline... >> > >> > Cheers, >> > Pedro >> > >> > >> > On 10/05/2013 09:15 PM, Sanne Grinovero wrote: >> >> >> >> Hi Pedro, >> >> looks like you're diving in some good fun :-) >> >> BTW please keep the dev discussions on the mailing list, adding it. >> >> >> >> inline : >> >> >> >> On 4 October 2013 22:01, Pedro Ruivo <[email protected]> wrote: >> >>> >> >>> Hi, >> >>> >> >>> Sanne I need your expertise in here. I'm afraid that the problem is in >> >>> FileListOperations :( >> >>> I think the FileListOperations implementation needs a transactional >> >>> cache >> >>> with strong consistency... >> >>> >> >>> I'm 99% sure that it is originating the java.lang.AssertionError: file >> >>> XPTO >> >>> does not exist. I find out that we have multiple threads adding and >> >>> removing >> >>> files from the list. The scenario in [1] we see 2 threads loading the >> >>> key >> >>> from the cache loader and one thread adds a file and other removes. >> >>> the >> >>> thread that removes is the last one to commit and the file list is >> >>> updated >> >>> to an old state. When it tries to updat an index, I got the assertion >> >>> error. >> >> >> >> >> >> Nice, looks like you're on something. >> >> I've never seen specifically an AssertionError, looks like you have a >> >> new test. Could you share it? >> > >> > >> > yes of course: >> > >> > https://github.com/pruivo/infinispan/blob/a4483d08b92d301350823c7fd42725c339a65c7b/query/src/test/java/org/infinispan/query/cacheloaders/CacheStoreTest.java >> > >> > so far, only the tests with eviction are failing... >> >> Some of the failures you're seeing are caused by internal "assert" >> keyword in Lucene's code, which have the purpose of verifying the >> "filesystem" is going to be synched properly. >> These assertions don't apply when using our storage: we don't need >> this synch to happen: in fact if it weren't because of the assertions >> the whole method would be a no-op as it finally delegates all logic to >> a method in the Infinispan Directory which is a no-op. >> >> In other words, these are misleading failures and we'd need to avoid >> the TestNG "feature" of enabling assertions in this case. >> >> Still, even if the stacktrace is misleading, I agree on your diagnosis >> below. >> >> Could you reproduce the problem without involving also the Query >> framework? >> I'd hope that such a test could be independent and live solely in the >> lucene-directory module; in practice if you can only reproduce it with >> the query module it makes me suspicious that we're actually debugging >> a race condition in the initialization of the two services: a race >> between the query initialization thread needing to check the index >> state (so potentially triggering a load from cachestore), and the >> thread performing the cachestore preload. >> (I see your test also fails without preload, but just wondering if >> that might be an additional complexity). >> >> >> >> >> Let's step back a second and consider the Cache usage from the point >> >> of view of FileListOperations. >> >> Note that even if you have two threads writing at the same time, as >> >> long as they are on the same node they will be adding/removing >> >> elements from the same instance of a ConcurrentHashMap. >> >> Since it's the same instance, it doesn't matter which thread will do >> >> the put operation as last: it will push the correct state. >> >> (there is an assumptions here, but we can forget about those for the >> >> sake of this debugging: same node -> fine as there is an external >> >> lock, no other node is allowed to write at the same time) >> >> >> > >> > 100% agreed with you but with cache store, we no longer ensure that 2 >> > (or >> > more) threads are pointing to the same instance of Concurrent Hash Set. >> > >> > With eviction, the entries are removed from in-memory container and >> > persisted in the cache store. The scenario I've described, 2 threads are >> > trying to add/remove a file and the file list does not exist in-memory. >> > So, >> > each thread will read from cache store and deserialize the byte array. >> > In >> > the end, each thread will have a pointer for different instances of >> > ConcurrentHashSet but with the same elements. And when this happens, we >> > lost >> > one of the operation. >> >> I'm seeing more than a couple of different smelly behaviors interacting >> here: >> >> 1## The single instance ConcurrentHashSet >> I guess this could be re-thought as it's making some strong >> assumptions, but considering this service can't be transactional I'd >> rather explore other solutions first as I think the following changes >> should be good enough. >> >> 2## Eviction not picking the right entry >> This single key is literally read for each and every performed query, >> and all writes as well. Each write, will write on this key. >> Even with eviction being enabled on the cache, I would never expect >> this key to be actually evicted! >> >> # Action 1: Open an issue to investigate the eviction choice: the >> strategy seems to be making a very poor job (or maybe it's just that >> maxEntries(10) is too low and makes LIRS degenerate into insane >> choices). > > > That well may be the case: the default concurrency level is 32, so with a > capacity of 10 you will have just 1 entry per segment. If there is more than > one entry in the same segment and they are both "hot", they will take turns > being evicted. > > The only thing we could change here is to make LIRS/LRU work for the entire > container at once, but right now I don't think they're scalable enough.
thanks Dan, great insight. No I don't think we need to improve anything to make an unrealistic test work; should we validate against an unreasonably low maxEntries value? At least log a warning? On the LIRS investigation proposal, I assume we already have tests which verify the choice of which elements to be evicted is reasonable? >> # Action 2: I think that for now we could disallow usage of eviction >> on the metadata cache. I didn't have tests using it, as I wouldn't >> recommended such a configuration as these entries are very hot and >> very small: viable to make it an illegal option? >> >> 3## The CacheLoader loading the same entry multiple times in parallel >> Kudos for finding out that there are situations in which we deal with >> multiple different instances of ConcurrentHashSet! Still, I think that >> Infinispan core is terribly wrong in this case: >> from the client code POV a new CHS is created with a put-if-absent >> atomic operation, and I will assume there that core will check/load >> any enabled cachestore as well. >> To handle multiple GET operations in parallel, or even in parallel >> with preload or the client's put-if-absent operation, I would *not* >> expect Infinispan core to storm the CacheStore implementation with >> multiple load operations on the same put: a lock should be hold on the >> potential key during such a load operation. >> >> If your observation is right, this could also be one of the reasons >> for so many complaints on the CacheStore performance: under these >> circumstances - which I'd guesstimate are quite common - we're doing >> lots of unnecessary IO, potentially stressing the slower storage >> devices. This could even have dramatic effects if there are frequent >> requests for entries for which the returned value is a null. >> >> # Action 3: Investigate and open a JIRA about the missing locking on >> CacheStore load operations. >> >> If this where resolves, we would still have a guarantee of single >> instance, am I correct? > > > I don't think so. > > Consider this scenario: > > 1. thread 1: set1 = cache.get(key) > 2. eviction thread: evict/passivate > 3. thread 2: set2 = cache.get(key) > > We don't know if a thread is still holding a reference to a value during > eviction, so we can't guarantee that there will always be a single instance > of that value. > > OTOH, if you need this guarantee, and you don't really need eviction, > wouldn't it be simpler to use a plain ConcurrentHashMap instead? I wish I could, but the CHM instance needs to be shared with other Directory instances running on the same Cache so it needs to be retrieved at least once from the Cache, and I need any new change to update the copy being stored in a CacheStore, so that in case of crash we can reload the correct state. Your description is indeed an example in which this wouldn't work; the Directory code is not holding a reference to it for longer than the method scope but indeed that might be too long :) Let's disallow eviction, that should be good enough. Alternatively we could register the CHM in an internal service to guarantee unique in-memory instance across Directory instances sharing the same cache.. but I don't think there is need for eviction on the metadata. Sanne > > >> >> > >> > Also, the problem is easily reproduce when you enable the storeAsBinary >> > for >> > values because each cache.get will deserialize the byte array and create >> > different instances. >> >> That's another option that I've never used on a cache meant to store a >> Lucene index as it doesn't make sense for the purpose. >> But you're right that's an assumption I haven't thought of: >> >> # Action 4: What do you think of promoting that as an illegal >> configuration? created ISPN-3593 >> >> > >> > That's why I think we would need a transaction. >> >> -1 the Directory can not drive a transaction as we hope some day in >> the future to include changes to the index in a user transaction. >> >> Cheers, >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
