On Wed, Oct 8, 2014 at 7:53 PM, William Burns <[email protected]> wrote:
> On Wed, Oct 8, 2014 at 12:41 PM, Dan Berindei <[email protected]> > wrote: > > > > > > On Wed, Oct 8, 2014 at 6:19 PM, Radim Vansa <[email protected]> wrote: > >> > >> Users expect that size() will be constant-time (or linear to cluster > >> size), and generally fast operation. I'd prefer to keep it that way. > >> Though, even the MR way (used for HotRod size() now) needs to crawl > >> through all the entries locally. > > > > > > They might expect that, but there is nothing in the Map API suggesting > it. > > > >> > >> > >> 'Heretic, not very well though of and changing too many things' idea: > >> what about having data container segment-aware? Then you'd just bcast > >> SizeCommand with given topologyId and sum up sizes of primary-owned > >> segments... It's not a complete solution, but at least that would enable > >> to get the number of locally owned entries quite fast. Though, you can't > >> do that easily with cache stores (without changing SPI). > > > > > > We could create a separate DataContainer for each segment. But would it > > really be worth the trouble? I don't know of anyone using size() for > > something other than checking that their data was properly loaded into > the > > cache, and they don't need a super-fast size() for that. > > Having a DataContainer per segment would actually reduce required > memory usage for the distributed iterator as well, since we can query > data by segment much more efficiently and close out segments one by > one per node instead of having to keep multiple open at once. When I > asked about this before it was kind of a we can deal with it later > kind thing. I would think this would increase ST operation time as > well. > You mean it would improve ST performance, because it wouldn't have to compute the hash of each key in the data container? I don't think we have ever considered splitting the data container for ST, as it didn't seem worth the trouble. OTOH we wanted to add a segment-based query to the cache loader SPI every since we started designing NBST :) > > > >> > >> > >> Regarding cache stores, IMO we're damned anyway: when calling > >> cacheStore.size(), it can report more entries as those haven't been > >> expired yet, it can report less entries as those can be expired due to > >> [1]. Or, we'll enumerate all the entries, and that's going to be slow > >> (btw., [1] reminded me that we should enumerate both datacontainer AND > >> cachestores even if passivation is not enabled). > > > > > > Exactly, we need to iterate all the entries from the stores if we want > > something remotely accurate (although I had forgotten about expiration > also > > being a problem). Otherwise we could just leave size() as it is now, it's > > pretty fast :) > > > >> > >> > >> Radim > >> > >> [1] https://issues.jboss.org/browse/ISPN-3202 > >> > >> On 10/08/2014 04:42 PM, William Burns wrote: > >> > So it seems we would want to change this for 7.0 if possible since it > >> > would be a bigger change for something like 7.1 and 8.0 would be even > >> > further out. I should be able to put this together for CR2. > >> > > >> > It seems that we want to implement keySet, values and entrySet methods > >> > using the entry iterator approach. > >> > > >> > It is however unclear for the size method if we want to use MR entry > >> > counting and not worry about the rehash and passivation issues since > >> > it is just an estimation anyways. Or if we want to also use the entry > >> > iterator which should be closer approximation but will require more > >> > network overhead and memory usage. > >> > > >> > Also we didn't really talk about the fact that these methods would > >> > ignore ongoing transactions and if that is a concern or not. > >> > > >> > - Will > >> > > >> > On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <[email protected]> > >> > wrote: > >> >> On Oct 8, 2014, at 15:11, Dan Berindei <[email protected]> > wrote: > >> >> > >> >>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <[email protected]> > >> >>> wrote: > >> >>> On Oct 3, 2014, at 9:30, Radim Vansa <[email protected]> wrote: > >> >>> > >> >>>> Hi, > >> >>>> > >> >>>> recently we had a discussion about what size() returns, but I've > >> >>>> realized there are more things that users would like to know. My > >> >>>> question is whether you think that they would really appreciate it, > >> >>>> or > >> >>>> whether it's just my QA point of view where I sometimes compute the > >> >>>> 'checksums' of cache to see if I didn't lost anything. > >> >>>> > >> >>>> There are those sizes: > >> >>>> A) number of owned entries > >> >>>> B) number of entries stored locally in memory > >> >>>> C) number of entries stored in each local cache store > >> >>>> D) number of entries stored in each shared cache store > >> >>>> E) total number of entries in cache > >> >>>> > >> >>>> So far, we can get > >> >>>> B via withFlags(SKIP_CACHE_LOAD).size() > >> >>>> (passivation ? B : 0) + firstNonZero(C, D) via size() > >> >>>> E via distributed iterators / MR > >> >>>> A via data container iteration + distribution manager query, but > only > >> >>>> without cache store > >> >>>> C or D through > >> >>>> > >> >>>> > getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() > >> >>>> > >> >>>> I think that it would go along with users' expectations if size() > >> >>>> returned E and for the rest we should have special methods on > >> >>>> AdvancedCache. That would of course change the meaning of size(), > but > >> >>>> I'd say that finally to something that has firm meaning. > >> >>>> > >> >>>> WDYT? > >> >>> There was a lot of arguments in past whether size() and other > methods > >> >>> that operate over all the elements (keySet, values) are useful > because: > >> >>> - they are approximate (data changes during iteration) > >> >>> - they are very resource consuming and might be miss-used (this is > the > >> >>> reason we chosen to use size() with its current local semantic) > >> >>> > >> >>> These methods (size, keys, values) are useful for people and I think > >> >>> we were not wise to implement them only on top of the local data: > this is > >> >>> like preferring efficiency over correctness. This also created a > lot of > >> >>> confusion with our users, question like size() doesn't return the > correct > >> >>> value being asked regularly. I totally agree that size() returns E > (i.e. > >> >>> everything that is stored within the grid, including persistence) > and it's > >> >>> performance implications to be documented accordingly. For keySet > and values > >> >>> - we should stop implementing them (throw exception) and point > users to > >> >>> Will's distributed iterator which is a nicer way to achieve the > desired > >> >>> behavior. > >> >>> > >> >>> We can also implement keySet() and values() on top of the > distributed > >> >>> entry iterator and document that using the iterator directly is > better. > >> >> Yes, that's what I meant as well. > >> >> > >> >> Cheers, > >> >> -- > >> >> Mircea Markus > >> >> Infinispan lead (www.infinispan.org) > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> infinispan-dev mailing list > >> >> [email protected] > >> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > _______________________________________________ > >> > infinispan-dev mailing list > >> > [email protected] > >> > https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> -- > >> Radim Vansa <[email protected]> > >> JBoss DataGrid QA > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> [email protected] > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > > > > > _______________________________________________ > > infinispan-dev mailing list > > [email protected] > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
