I meant as a quick implementation of that, not that we want to tell users to do it.
Tristan On 10/10/14 16:18, Radim Vansa wrote: > That we should expose that as one method, not forcing people to > implement the sum() themselves. > > And possibly cachestores, again. > > Radim > > On 10/10/2014 04:06 PM, Tristan Tarrant wrote: >> What's wrong with sum(Datacontainer.size())/numOwners ? >> >> Tristan >> >> On 10/10/14 16:03, Radim Vansa wrote: >>> On 10/10/2014 02:38 PM, William Burns wrote: >>>> On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <[email protected]> wrote: >>>>> Users expect that size() will be constant-time (or linear to cluster >>>>> size), and generally fast operation. I'd prefer to keep it that way. >>>>> Though, even the MR way (used for HotRod size() now) needs to crawl >>>>> through all the entries locally. >>>> Many in memory collections require O(n) to do size such as >>>> ConcurrentLinkedQueue, so I wouldn't say size should always be >>>> expected to be constant time or O(c) where c is # of nodes. Granted a >>>> user can expect anything they want. >>> OK, I stand corrected. Moreover, I was generalizing myself to all users, >>> a common mistake :) >>> >>> Anyway, monitoring tools love nice charts, and I can imagine monitoring >>> software polling every 1 second to update that cool chart with cache >>> size. Do we want a fast but imprecise variant of this operation in some >>> statistics class? >>> >>> Radim >>> >>>>> 'Heretic, not very well though of and changing too many things' idea: >>>>> what about having data container segment-aware? Then you'd just bcast >>>>> SizeCommand with given topologyId and sum up sizes of primary-owned >>>>> segments... It's not a complete solution, but at least that would enable >>>>> to get the number of locally owned entries quite fast. Though, you can't >>>>> do that easily with cache stores (without changing SPI). >>>>> >>>>> Regarding cache stores, IMO we're damned anyway: when calling >>>>> cacheStore.size(), it can report more entries as those haven't been >>>>> expired yet, it can report less entries as those can be expired due to >>>>> [1]. Or, we'll enumerate all the entries, and that's going to be slow >>>>> (btw., [1] reminded me that we should enumerate both datacontainer AND >>>>> cachestores even if passivation is not enabled). >>>> This is precisely what the distributed iterator does. And also >>>> support for expired entries was recently integrated as I missed that >>>> in the original implementation [a] >>>> >>>> [a] https://issues.jboss.org/browse/ISPN-4643 >>>> >>>>> Radim >>>>> >>>>> [1] https://issues.jboss.org/browse/ISPN-3202 >>>>> >>>>> On 10/08/2014 04:42 PM, William Burns wrote: >>>>>> So it seems we would want to change this for 7.0 if possible since it >>>>>> would be a bigger change for something like 7.1 and 8.0 would be even >>>>>> further out. I should be able to put this together for CR2. >>>>>> >>>>>> It seems that we want to implement keySet, values and entrySet methods >>>>>> using the entry iterator approach. >>>>>> >>>>>> It is however unclear for the size method if we want to use MR entry >>>>>> counting and not worry about the rehash and passivation issues since >>>>>> it is just an estimation anyways. Or if we want to also use the entry >>>>>> iterator which should be closer approximation but will require more >>>>>> network overhead and memory usage. >>>>>> >>>>>> Also we didn't really talk about the fact that these methods would >>>>>> ignore ongoing transactions and if that is a concern or not. >>>>>> >>>>>> - Will >>>>>> >>>>>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <[email protected]> >>>>>> wrote: >>>>>>> On Oct 8, 2014, at 15:11, Dan Berindei <[email protected]> wrote: >>>>>>> >>>>>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <[email protected]> >>>>>>>> wrote: >>>>>>>> On Oct 3, 2014, at 9:30, Radim Vansa <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> recently we had a discussion about what size() returns, but I've >>>>>>>>> realized there are more things that users would like to know. My >>>>>>>>> question is whether you think that they would really appreciate it, or >>>>>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>>>>> >>>>>>>>> There are those sizes: >>>>>>>>> A) number of owned entries >>>>>>>>> B) number of entries stored locally in memory >>>>>>>>> C) number of entries stored in each local cache store >>>>>>>>> D) number of entries stored in each shared cache store >>>>>>>>> E) total number of entries in cache >>>>>>>>> >>>>>>>>> So far, we can get >>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>>>>> E via distributed iterators / MR >>>>>>>>> A via data container iteration + distribution manager query, but only >>>>>>>>> without cache store >>>>>>>>> C or D through >>>>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>>>>> >>>>>>>>> I think that it would go along with users' expectations if size() >>>>>>>>> returned E and for the rest we should have special methods on >>>>>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>>>>> I'd say that finally to something that has firm meaning. >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>> There was a lot of arguments in past whether size() and other methods >>>>>>>> that operate over all the elements (keySet, values) are useful because: >>>>>>>> - they are approximate (data changes during iteration) >>>>>>>> - they are very resource consuming and might be miss-used (this is the >>>>>>>> reason we chosen to use size() with its current local semantic) >>>>>>>> >>>>>>>> These methods (size, keys, values) are useful for people and I think >>>>>>>> we were not wise to implement them only on top of the local data: this >>>>>>>> is like preferring efficiency over correctness. This also created a >>>>>>>> lot of confusion with our users, question like size() doesn't return >>>>>>>> the correct value being asked regularly. I totally agree that size() >>>>>>>> returns E (i.e. everything that is stored within the grid, including >>>>>>>> persistence) and it's performance implications to be documented >>>>>>>> accordingly. For keySet and values - we should stop implementing them >>>>>>>> (throw exception) and point users to Will's distributed iterator which >>>>>>>> is a nicer way to achieve the desired behavior. >>>>>>>> >>>>>>>> We can also implement keySet() and values() on top of the distributed >>>>>>>> entry iterator and document that using the iterator directly is better. >>>>>>> Yes, that's what I meant as well. >>>>>>> >>>>>>> Cheers, >>>>>>> -- >>>>>>> Mircea Markus >>>>>>> Infinispan lead (www.infinispan.org) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> [email protected] >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> [email protected] >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> -- >>>>> Radim Vansa <[email protected]> >>>>> JBoss DataGrid QA >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> [email protected] >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
