On 10/10/2014 02:38 PM, William Burns wrote: > On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <rva...@redhat.com> wrote: >> Users expect that size() will be constant-time (or linear to cluster >> size), and generally fast operation. I'd prefer to keep it that way. >> Though, even the MR way (used for HotRod size() now) needs to crawl >> through all the entries locally. > Many in memory collections require O(n) to do size such as > ConcurrentLinkedQueue, so I wouldn't say size should always be > expected to be constant time or O(c) where c is # of nodes. Granted a > user can expect anything they want.
OK, I stand corrected. Moreover, I was generalizing myself to all users, a common mistake :) Anyway, monitoring tools love nice charts, and I can imagine monitoring software polling every 1 second to update that cool chart with cache size. Do we want a fast but imprecise variant of this operation in some statistics class? Radim > >> 'Heretic, not very well though of and changing too many things' idea: >> what about having data container segment-aware? Then you'd just bcast >> SizeCommand with given topologyId and sum up sizes of primary-owned >> segments... It's not a complete solution, but at least that would enable >> to get the number of locally owned entries quite fast. Though, you can't >> do that easily with cache stores (without changing SPI). >> >> Regarding cache stores, IMO we're damned anyway: when calling >> cacheStore.size(), it can report more entries as those haven't been >> expired yet, it can report less entries as those can be expired due to >> [1]. Or, we'll enumerate all the entries, and that's going to be slow >> (btw., [1] reminded me that we should enumerate both datacontainer AND >> cachestores even if passivation is not enabled). > This is precisely what the distributed iterator does. And also > support for expired entries was recently integrated as I missed that > in the original implementation [a] > > [a] https://issues.jboss.org/browse/ISPN-4643 > >> Radim >> >> [1] https://issues.jboss.org/browse/ISPN-3202 >> >> On 10/08/2014 04:42 PM, William Burns wrote: >>> So it seems we would want to change this for 7.0 if possible since it >>> would be a bigger change for something like 7.1 and 8.0 would be even >>> further out. I should be able to put this together for CR2. >>> >>> It seems that we want to implement keySet, values and entrySet methods >>> using the entry iterator approach. >>> >>> It is however unclear for the size method if we want to use MR entry >>> counting and not worry about the rehash and passivation issues since >>> it is just an estimation anyways. Or if we want to also use the entry >>> iterator which should be closer approximation but will require more >>> network overhead and memory usage. >>> >>> Also we didn't really talk about the fact that these methods would >>> ignore ongoing transactions and if that is a concern or not. >>> >>> - Will >>> >>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <mmar...@redhat.com> wrote: >>>> On Oct 8, 2014, at 15:11, Dan Berindei <dan.berin...@gmail.com> wrote: >>>> >>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <mmar...@redhat.com> wrote: >>>>> On Oct 3, 2014, at 9:30, Radim Vansa <rva...@redhat.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> recently we had a discussion about what size() returns, but I've >>>>>> realized there are more things that users would like to know. My >>>>>> question is whether you think that they would really appreciate it, or >>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>> >>>>>> There are those sizes: >>>>>> A) number of owned entries >>>>>> B) number of entries stored locally in memory >>>>>> C) number of entries stored in each local cache store >>>>>> D) number of entries stored in each shared cache store >>>>>> E) total number of entries in cache >>>>>> >>>>>> So far, we can get >>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>> E via distributed iterators / MR >>>>>> A via data container iteration + distribution manager query, but only >>>>>> without cache store >>>>>> C or D through >>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>> >>>>>> I think that it would go along with users' expectations if size() >>>>>> returned E and for the rest we should have special methods on >>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>> I'd say that finally to something that has firm meaning. >>>>>> >>>>>> WDYT? >>>>> There was a lot of arguments in past whether size() and other methods >>>>> that operate over all the elements (keySet, values) are useful because: >>>>> - they are approximate (data changes during iteration) >>>>> - they are very resource consuming and might be miss-used (this is the >>>>> reason we chosen to use size() with its current local semantic) >>>>> >>>>> These methods (size, keys, values) are useful for people and I think we >>>>> were not wise to implement them only on top of the local data: this is >>>>> like preferring efficiency over correctness. This also created a lot of >>>>> confusion with our users, question like size() doesn't return the correct >>>>> value being asked regularly. I totally agree that size() returns E (i.e. >>>>> everything that is stored within the grid, including persistence) and >>>>> it's performance implications to be documented accordingly. For keySet >>>>> and values - we should stop implementing them (throw exception) and point >>>>> users to Will's distributed iterator which is a nicer way to achieve the >>>>> desired behavior. >>>>> >>>>> We can also implement keySet() and values() on top of the distributed >>>>> entry iterator and document that using the iterator directly is better. >>>> Yes, that's what I meant as well. >>>> >>>> Cheers, >>>> -- >>>> Mircea Markus >>>> Infinispan lead (www.infinispan.org) >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> -- >> Radim Vansa <rva...@redhat.com> >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <rva...@redhat.com> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev