On Wed, Oct 8, 2014 at 6:14 PM, William Burns <[email protected]> wrote:
> On Wed, Oct 8, 2014 at 10:57 AM, Dan Berindei <[email protected]> > wrote: > > > > > > On Wed, Oct 8, 2014 at 5:42 PM, William Burns <[email protected]> > wrote: > >> > >> So it seems we would want to change this for 7.0 if possible since it > >> would be a bigger change for something like 7.1 and 8.0 would be even > >> further out. I should be able to put this together for CR2. > > > > > > I'm not 100% convinced that we need it for 7.x. For 8.0 I would recommend > > removing the size() method altogether, and providing some looser > > "statistics" instead. > > Yeah I guess I don't know enough about the demand for these methods or > what people wanted to use them for to know what kind of priority they > should be given. > > It sounds like you are talking about decoupling from the > Map/ConcurrentMap interface completely then, right? So we would also > eliminate the other bulk methods (keySet, values, entrySet)? > Yes, I would base the Cache interface on JSR-107's Cache, which doesn't have size() or the other methods. > > > > >> > >> > >> It seems that we want to implement keySet, values and entrySet methods > >> using the entry iterator approach. > >> > >> It is however unclear for the size method if we want to use MR entry > >> counting and not worry about the rehash and passivation issues since > >> it is just an estimation anyways. Or if we want to also use the entry > >> iterator which should be closer approximation but will require more > >> network overhead and memory usage. > > > > > > +1 to use the entry iterator from me, ignoring state transfer we can get > > some pretty wild fluctuations in the size of the cache. > > That is personally my feeling as well, but I tend to err more on the > side of correctness to begin with. > > > We could use a distributed task for Cache.isEmpty() instead of size() == > 0, > > though. > > Yes that should be a good optimization either way. > > > > >> > >> > >> Also we didn't really talk about the fact that these methods would > >> ignore ongoing transactions and if that is a concern or not. > >> > > > > It might be a concern for the Hibernate 2LC impl, it was their TCK that > > prompted the last round of discussions about clear(). > > Although I wonder how much these methods are even used since they only > work for Local, Replication or Invalidation caches in their current > state (and didn't even use loaders until 6.0). > There is some more information about the test in the mailing list discussion [1] There's also a JIRA for clear() [2] I think 2LC almost never uses distribution, so size() being local-only didn't matter, but making it non-tx could cause problems - at least for that particular test. [1] http://lists.jboss.org/pipermail/infinispan-dev/2013-October/013914.html [2] https://issues.jboss.org/browse/ISPN-3656 > > > > We haven't talked about what size(), keySet() and values() should return > for > > an invalidation cache either... I forget, does the distributed entry > > iterator work with invalidation caches? > > It works the same as a local cache so only the local node contents are > returned. Replicated does the same thing, distributed is the only > special case. This was the only thing that made sense to me, but if > you have any ideas that would be great to hear for possibly enhancing > Invalidation iteration. > Sounds good to me. cache.get(k) will search on all the nodes via ClusterLoader, so there is a certain appeal in making the entry iterator do the same. But invalidation caches are used with an external (non-CacheLoader) source of data anyway, so we can never return "all the entries". > > > > > >> > >> - Will > >> > >> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <[email protected]> > wrote: > >> > > >> > On Oct 8, 2014, at 15:11, Dan Berindei <[email protected]> > wrote: > >> > > >> >> > >> >> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <[email protected]> > >> >> wrote: > >> >> On Oct 3, 2014, at 9:30, Radim Vansa <[email protected]> wrote: > >> >> > >> >> > Hi, > >> >> > > >> >> > recently we had a discussion about what size() returns, but I've > >> >> > realized there are more things that users would like to know. My > >> >> > question is whether you think that they would really appreciate it, > >> >> > or > >> >> > whether it's just my QA point of view where I sometimes compute the > >> >> > 'checksums' of cache to see if I didn't lost anything. > >> >> > > >> >> > There are those sizes: > >> >> > A) number of owned entries > >> >> > B) number of entries stored locally in memory > >> >> > C) number of entries stored in each local cache store > >> >> > D) number of entries stored in each shared cache store > >> >> > E) total number of entries in cache > >> >> > > >> >> > So far, we can get > >> >> > B via withFlags(SKIP_CACHE_LOAD).size() > >> >> > (passivation ? B : 0) + firstNonZero(C, D) via size() > >> >> > E via distributed iterators / MR > >> >> > A via data container iteration + distribution manager query, but > only > >> >> > without cache store > >> >> > C or D through > >> >> > > >> >> > > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >> >> > > >> >> > I think that it would go along with users' expectations if size() > >> >> > returned E and for the rest we should have special methods on > >> >> > AdvancedCache. That would of course change the meaning of size(), > but > >> >> > I'd say that finally to something that has firm meaning. > >> >> > > >> >> > WDYT? > >> >> > >> >> There was a lot of arguments in past whether size() and other methods > >> >> that operate over all the elements (keySet, values) are useful > because: > >> >> - they are approximate (data changes during iteration) > >> >> - they are very resource consuming and might be miss-used (this is > the > >> >> reason we chosen to use size() with its current local semantic) > >> >> > >> >> These methods (size, keys, values) are useful for people and I think > we > >> >> were not wise to implement them only on top of the local data: this > is like > >> >> preferring efficiency over correctness. This also created a lot of > confusion > >> >> with our users, question like size() doesn't return the correct > value being > >> >> asked regularly. I totally agree that size() returns E (i.e. > everything that > >> >> is stored within the grid, including persistence) and it's > performance > >> >> implications to be documented accordingly. For keySet and values - > we should > >> >> stop implementing them (throw exception) and point users to Will's > >> >> distributed iterator which is a nicer way to achieve the desired > behavior. > >> >> > >> >> We can also implement keySet() and values() on top of the distributed > >> >> entry iterator and document that using the iterator directly is > better. > >> > > >> > Yes, that's what I meant as well. > >> > > >> > Cheers, > >> > -- > >> > Mircea Markus > >> > Infinispan lead (www.infinispan.org) > >> > > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > infinispan-dev mailing list > >> > [email protected] > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> [email protected] > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > [email protected] > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
