On 10/07/2013 10:44 AM, Dan Berindei wrote: > > > > On Mon, Oct 7, 2013 at 2:30 AM, Sanne Grinovero <[email protected] > <mailto:[email protected]>> wrote: > > On 6 October 2013 00:01, Pedro Ruivo <[email protected] > <mailto:[email protected]>> wrote: > > Hi Sanne. > > > > Thanks for your comments. please see inline... > > > > Cheers, > > Pedro > > > > > > On 10/05/2013 09:15 PM, Sanne Grinovero wrote: > >> > >> Hi Pedro, > >> looks like you're diving in some good fun :-) > >> BTW please keep the dev discussions on the mailing list, adding it. > >> > >> inline : > >> > >> On 4 October 2013 22:01, Pedro Ruivo <[email protected] > <mailto:[email protected]>> wrote: > >>> > >>> Hi, > >>> > >>> Sanne I need your expertise in here. I'm afraid that the > problem is in > >>> FileListOperations :( > >>> I think the FileListOperations implementation needs a > transactional cache > >>> with strong consistency... > >>> > >>> I'm 99% sure that it is originating the > java.lang.AssertionError: file > >>> XPTO > >>> does not exist. I find out that we have multiple threads adding and > >>> removing > >>> files from the list. The scenario in [1] we see 2 threads > loading the key > >>> from the cache loader and one thread adds a file and other > removes. the > >>> thread that removes is the last one to commit and the file list is > >>> updated > >>> to an old state. When it tries to updat an index, I got the > assertion > >>> error. > >> > >> > >> Nice, looks like you're on something. > >> I've never seen specifically an AssertionError, looks like you > have a > >> new test. Could you share it? > > > > > > yes of course: > > > > https://github.com/pruivo/infinispan/blob/a4483d08b92d301350823c7fd42725c339a65c7b/query/src/test/java/org/infinispan/query/cacheloaders/CacheStoreTest.java > > > > so far, only the tests with eviction are failing... > > Some of the failures you're seeing are caused by internal "assert" > keyword in Lucene's code, which have the purpose of verifying the > "filesystem" is going to be synched properly. > These assertions don't apply when using our storage: we don't need > this synch to happen: in fact if it weren't because of the assertions > the whole method would be a no-op as it finally delegates all logic to > a method in the Infinispan Directory which is a no-op. > > In other words, these are misleading failures and we'd need to avoid > the TestNG "feature" of enabling assertions in this case. > > Still, even if the stacktrace is misleading, I agree on your > diagnosis below. > > Could you reproduce the problem without involving also the Query > framework? > I'd hope that such a test could be independent and live solely in the > lucene-directory module; in practice if you can only reproduce it with > the query module it makes me suspicious that we're actually debugging > a race condition in the initialization of the two services: a race > between the query initialization thread needing to check the index > state (so potentially triggering a load from cachestore), and the > thread performing the cachestore preload. > (I see your test also fails without preload, but just wondering if > that might be an additional complexity). > > >> > >> Let's step back a second and consider the Cache usage from the point > >> of view of FileListOperations. > >> Note that even if you have two threads writing at the same time, as > >> long as they are on the same node they will be adding/removing > >> elements from the same instance of a ConcurrentHashMap. > >> Since it's the same instance, it doesn't matter which thread will do > >> the put operation as last: it will push the correct state. > >> (there is an assumptions here, but we can forget about those for the > >> sake of this debugging: same node -> fine as there is an external > >> lock, no other node is allowed to write at the same time) > >> > > > > 100% agreed with you but with cache store, we no longer ensure > that 2 (or > > more) threads are pointing to the same instance of Concurrent > Hash Set. > > > > With eviction, the entries are removed from in-memory container and > > persisted in the cache store. The scenario I've described, 2 > threads are > > trying to add/remove a file and the file list does not exist > in-memory. So, > > each thread will read from cache store and deserialize the byte > array. In > > the end, each thread will have a pointer for different instances of > > ConcurrentHashSet but with the same elements. And when this > happens, we lost > > one of the operation. > > I'm seeing more than a couple of different smelly behaviors > interacting here: > > 1## The single instance ConcurrentHashSet > I guess this could be re-thought as it's making some strong > assumptions, but considering this service can't be transactional I'd > rather explore other solutions first as I think the following changes > should be good enough. > > 2## Eviction not picking the right entry > This single key is literally read for each and every performed query, > and all writes as well. Each write, will write on this key. > Even with eviction being enabled on the cache, I would never expect > this key to be actually evicted! > > # Action 1: Open an issue to investigate the eviction choice: the > strategy seems to be making a very poor job (or maybe it's just that > maxEntries(10) is too low and makes LIRS degenerate into insane > choices). > > > That well may be the case: the default concurrency level is 32, so with > a capacity of 10 you will have just 1 entry per segment. If there is > more than one entry in the same segment and they are both "hot", they > will take turns being evicted. > > The only thing we could change here is to make LIRS/LRU work for the > entire container at once, but right now I don't think they're scalable > enough. > > > # Action 2: I think that for now we could disallow usage of eviction > on the metadata cache. I didn't have tests using it, as I wouldn't > recommended such a configuration as these entries are very hot and > very small: viable to make it an illegal option? > > 3## The CacheLoader loading the same entry multiple times in parallel > Kudos for finding out that there are situations in which we deal with > multiple different instances of ConcurrentHashSet! Still, I think that > Infinispan core is terribly wrong in this case: > from the client code POV a new CHS is created with a put-if-absent > atomic operation, and I will assume there that core will check/load > any enabled cachestore as well. > To handle multiple GET operations in parallel, or even in parallel > with preload or the client's put-if-absent operation, I would *not* > expect Infinispan core to storm the CacheStore implementation with > multiple load operations on the same put: a lock should be hold on the > potential key during such a load operation. > > If your observation is right, this could also be one of the reasons > for so many complaints on the CacheStore performance: under these > circumstances - which I'd guesstimate are quite common - we're doing > lots of unnecessary IO, potentially stressing the slower storage > devices. This could even have dramatic effects if there are frequent > requests for entries for which the returned value is a null. > > # Action 3: Investigate and open a JIRA about the missing locking on > CacheStore load operations. > > If this where resolves, we would still have a guarantee of single > instance, am I correct? > > > I don't think so. > > Consider this scenario: > > 1. thread 1: set1 = cache.get(key) > 2. eviction thread: evict/passivate > 3. thread 2: set2 = cache.get(key) > > We don't know if a thread is still holding a reference to a value during > eviction, so we can't guarantee that there will always be a single > instance of that value. > > OTOH, if you need this guarantee, and you don't really need eviction, > wouldn't it be simpler to use a plain ConcurrentHashMap instead?
I don't think so because it needs to be replicated. the directory only has one node writing to that key and the remaining nodes are read-only nodes. They need to still have access to that list. Correct me if I'm wrong Sanne. > > > > > > > Also, the problem is easily reproduce when you enable the > storeAsBinary for > > values because each cache.get will deserialize the byte array and > create > > different instances. > > That's another option that I've never used on a cache meant to store a > Lucene index as it doesn't make sense for the purpose. > But you're right that's an assumption I haven't thought of: > > # Action 4: What do you think of promoting that as an illegal > configuration? created ISPN-3593 > > > > > That's why I think we would need a transaction. > > -1 the Directory can not drive a transaction as we hope some day in > the future to include changes to the index in a user transaction. > > Cheers, > Sanne > _______________________________________________ > infinispan-dev mailing list > [email protected] <mailto:[email protected]> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
