On Fri, Oct 10, 2014 at 5:20 PM, Mircea Markus <[email protected]> wrote:
> > On Oct 10, 2014, at 15:18, Radim Vansa <[email protected]> wrote: > > > That we should expose that as one method, not forcing people to > > implement the sum() themselves. > > Hmm, isn't the method you mention cache.size() ? :-) > Nope, because we decided to make cache.size() precise-but-slow :) > > > > > And possibly cachestores, again. > > AdvancedCacheLoader has size. > > > > > Radim > > > > On 10/10/2014 04:06 PM, Tristan Tarrant wrote: > >> What's wrong with sum(Datacontainer.size())/numOwners ? > >> > >> Tristan > >> > >> On 10/10/14 16:03, Radim Vansa wrote: > >>> On 10/10/2014 02:38 PM, William Burns wrote: > >>>> On Wed, Oct 8, 2014 at 11:19 AM, Radim Vansa <[email protected]> > wrote: > >>>>> Users expect that size() will be constant-time (or linear to cluster > >>>>> size), and generally fast operation. I'd prefer to keep it that way. > >>>>> Though, even the MR way (used for HotRod size() now) needs to crawl > >>>>> through all the entries locally. > >>>> Many in memory collections require O(n) to do size such as > >>>> ConcurrentLinkedQueue, so I wouldn't say size should always be > >>>> expected to be constant time or O(c) where c is # of nodes. Granted a > >>>> user can expect anything they want. > >>> OK, I stand corrected. Moreover, I was generalizing myself to all > users, > >>> a common mistake :) > >>> > >>> Anyway, monitoring tools love nice charts, and I can imagine monitoring > >>> software polling every 1 second to update that cool chart with cache > >>> size. Do we want a fast but imprecise variant of this operation in some > >>> statistics class? > >>> > >>> Radim > >>> > >>>>> 'Heretic, not very well though of and changing too many things' idea: > >>>>> what about having data container segment-aware? Then you'd just bcast > >>>>> SizeCommand with given topologyId and sum up sizes of primary-owned > >>>>> segments... It's not a complete solution, but at least that would > enable > >>>>> to get the number of locally owned entries quite fast. Though, you > can't > >>>>> do that easily with cache stores (without changing SPI). > >>>>> > >>>>> Regarding cache stores, IMO we're damned anyway: when calling > >>>>> cacheStore.size(), it can report more entries as those haven't been > >>>>> expired yet, it can report less entries as those can be expired due > to > >>>>> [1]. Or, we'll enumerate all the entries, and that's going to be slow > >>>>> (btw., [1] reminded me that we should enumerate both datacontainer > AND > >>>>> cachestores even if passivation is not enabled). > >>>> This is precisely what the distributed iterator does. And also > >>>> support for expired entries was recently integrated as I missed that > >>>> in the original implementation [a] > >>>> > >>>> [a] https://issues.jboss.org/browse/ISPN-4643 > >>>> > >>>>> Radim > >>>>> > >>>>> [1] https://issues.jboss.org/browse/ISPN-3202 > >>>>> > >>>>> On 10/08/2014 04:42 PM, William Burns wrote: > >>>>>> So it seems we would want to change this for 7.0 if possible since > it > >>>>>> would be a bigger change for something like 7.1 and 8.0 would be > even > >>>>>> further out. I should be able to put this together for CR2. > >>>>>> > >>>>>> It seems that we want to implement keySet, values and entrySet > methods > >>>>>> using the entry iterator approach. > >>>>>> > >>>>>> It is however unclear for the size method if we want to use MR entry > >>>>>> counting and not worry about the rehash and passivation issues since > >>>>>> it is just an estimation anyways. Or if we want to also use the > entry > >>>>>> iterator which should be closer approximation but will require more > >>>>>> network overhead and memory usage. > >>>>>> > >>>>>> Also we didn't really talk about the fact that these methods would > >>>>>> ignore ongoing transactions and if that is a concern or not. > >>>>>> > >>>>>> - Will > >>>>>> > >>>>>> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <[email protected]> > wrote: > >>>>>>> On Oct 8, 2014, at 15:11, Dan Berindei <[email protected]> > wrote: > >>>>>>> > >>>>>>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <[email protected]> > wrote: > >>>>>>>> On Oct 3, 2014, at 9:30, Radim Vansa <[email protected]> wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> recently we had a discussion about what size() returns, but I've > >>>>>>>>> realized there are more things that users would like to know. My > >>>>>>>>> question is whether you think that they would really appreciate > it, or > >>>>>>>>> whether it's just my QA point of view where I sometimes compute > the > >>>>>>>>> 'checksums' of cache to see if I didn't lost anything. > >>>>>>>>> > >>>>>>>>> There are those sizes: > >>>>>>>>> A) number of owned entries > >>>>>>>>> B) number of entries stored locally in memory > >>>>>>>>> C) number of entries stored in each local cache store > >>>>>>>>> D) number of entries stored in each shared cache store > >>>>>>>>> E) total number of entries in cache > >>>>>>>>> > >>>>>>>>> So far, we can get > >>>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() > >>>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() > >>>>>>>>> E via distributed iterators / MR > >>>>>>>>> A via data container iteration + distribution manager query, but > only > >>>>>>>>> without cache store > >>>>>>>>> C or D through > >>>>>>>>> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >>>>>>>>> > >>>>>>>>> I think that it would go along with users' expectations if size() > >>>>>>>>> returned E and for the rest we should have special methods on > >>>>>>>>> AdvancedCache. That would of course change the meaning of > size(), but > >>>>>>>>> I'd say that finally to something that has firm meaning. > >>>>>>>>> > >>>>>>>>> WDYT? > >>>>>>>> There was a lot of arguments in past whether size() and other > methods that operate over all the elements (keySet, values) are useful > because: > >>>>>>>> - they are approximate (data changes during iteration) > >>>>>>>> - they are very resource consuming and might be miss-used (this > is the reason we chosen to use size() with its current local semantic) > >>>>>>>> > >>>>>>>> These methods (size, keys, values) are useful for people and I > think we were not wise to implement them only on top of the local data: > this is like preferring efficiency over correctness. This also created a > lot of confusion with our users, question like size() doesn't return the > correct value being asked regularly. I totally agree that size() returns E > (i.e. everything that is stored within the grid, including persistence) and > it's performance implications to be documented accordingly. For keySet and > values - we should stop implementing them (throw exception) and point users > to Will's distributed iterator which is a nicer way to achieve the desired > behavior. > >>>>>>>> > >>>>>>>> We can also implement keySet() and values() on top of the > distributed entry iterator and document that using the iterator directly is > better. > >>>>>>> Yes, that's what I meant as well. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> -- > >>>>>>> Mircea Markus > >>>>>>> Infinispan lead (www.infinispan.org) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> infinispan-dev mailing list > >>>>>>> [email protected] > >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>>>> _______________________________________________ > >>>>>> infinispan-dev mailing list > >>>>>> [email protected] > >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>>> -- > >>>>> Radim Vansa <[email protected]> > >>>>> JBoss DataGrid QA > >>>>> > >>>>> _______________________________________________ > >>>>> infinispan-dev mailing list > >>>>> [email protected] > >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >>>> _______________________________________________ > >>>> infinispan-dev mailing list > >>>> [email protected] > >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> _______________________________________________ > >> infinispan-dev mailing list > >> [email protected] > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > -- > > Radim Vansa <[email protected]> > > JBoss DataGrid QA > > > > _______________________________________________ > > infinispan-dev mailing list > > [email protected] > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
