Considering all these very valid concerns I'd return on my proposal for throwing runtime exceptions via an (optional) decorator.
I'd have such a decorator in place by default, so that we make it very clear that - while you can remove it - the behaviour of such methods is "unusual" and that a user would be better off avoiding them unless he's into the advanced stuff. As said before, that worked very well for me in the past and it was great that - even while I did know - I had a safety guard to highlight unintended refactorings by others on my team who didn't know the black art of using Infinispan correctly. Sanne On 7 October 2014 13:43, Radim Vansa <rva...@redhat.com> wrote: > On 10/07/2014 02:21 PM, William Burns wrote: >> On Tue, Oct 7, 2014 at 7:32 AM, Radim Vansa <rva...@redhat.com> wrote: >>> If you have one local and one shared cache store, how should the command >>> behave? >>> >>> a) distexec/MR sum of cache.withFlags(SKIP_REMOTE_LOOKUP, >>> SKIP_BACKUP_ENTRIES).size() from all nodes? (note that there's no >>> SKIP_BACKUP_ENTRIES flag right now), where this method returns >>> localStore.size() for first non-shared cache store + passivation ? >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0) >> Calling the size method in either distexec or MR will give you >> inflated numbers as you need to pay attention to the numOwners to get >> a proper count. > > That's what I meant by the SKIP_BACKUP_ENTRIES - dataContainer should be > able to report only primary-owned entries, or we have to iterate and > apply the filtering outside. > >> >>> b) distexec/MR sum of sharedStore.size() + passivation ? sum of >>> dataContainer.size(SKIP_BACKUP_ENTRIES) : 0 >> Calling the size on a shared cache actually should work somewhat well >> (assuming all entries are stored in the shared cache). The problem is >> if passivation is enabled as you point out because you also have to >> check the data container which means you can also have an issue with >> concurrent activations and passivations (which you can't verify >> properly in either case without knowing the keys). >> >>> c) MR that would count the entries >> This is the only reliable way to do this with MR. And unfortunately >> if a rehash occurs I am not sure if you would get inconsistent numbers >> or an Exception. In the latter at least you should be able to make >> sure that you have the proper number when it does return without >> exception. I can't say how it works with multiple loaders though, my >> guess is that it may process the entry more than once so it depends on >> if your mapper is smart enough to realize it. > > I don't think that reporting incorrect size is *that* harmful - even > ConcurrentMap interface says that it's just a wild guess and when things > are changing, you can't rely on that. > >> >>> d) wrapper on distributed entry iteration with converters set to return >>> 0-sized entries >> Entry iterator can't return 0 sized entries (just the values). The >> keys are required to make sure that the count is correct and also to >> ensure that if a rehash happens in the middle it can properly continue >> to operate without having to start over. Entry iterator should work >> properly irrespective of the number of stores/loaders that are >> configured, since it keep track of already seen keys (so duplicates >> are ignored). > > Ok, I was simplifying that a bit. And by the way, I don't really like > the fact that for distributed entry iteration you need to be able to > keep all keys from one segment at one moment in memory. But fine - > distributed entry iteration is probably not the right way. > >> >> >>> And what about nodes with different configuration? >> Hard to know without knowing what the differences are. > > I had in my mind different loaders and passivation configuration (e.g. > some node could use shared store and some don't - do we want to handle > such obscure configs? Can we design that without the need to have > complicated decision trees what to include and what not?). > > Radim > >> >>> Radim >>> >>> On 10/06/2014 01:57 PM, Sanne Grinovero wrote: >>>> On 6 October 2014 12:44, Tristan Tarrant <ttarr...@redhat.com> wrote: >>>>> I think we should provide correct implementations of size() (and others) >>>>> and provide shortcut implementations using our usual Flag API (e.g. >>>>> SKIP_REMOTE_LOOKUP). >>>> Right that would be very nice. Same for CacheStore interaction: all >>>> cachestores should be included unless skipped explicitly. >>>> >>>> Sanne >>>> >>>>> Tristan >>>>> >>>>> On 06/10/14 12:57, Sanne Grinovero wrote: >>>>>> On 3 October 2014 18:38, Dennis Reed <der...@redhat.com> wrote: >>>>>>> Since size() is defined by the ConcurrentMap interface, it already has a >>>>>>> precisely defined meaning. The only "correct" implementation is E. >>>>>> +1 >> This is one of the things I have been wanting to do is actually >> implement the other Map methods across the entire cache. However to >> do a lot of these in a memory conscious way they would need to be ran >> ignoring any ongoing transactions. Actually having this requirement >> allows these methods to be implemented quite easily especially in >> conjunction with the EntryIterator. I almost made a PR for it a while >> back, but it seemed a little zealous to do at the same time and it >> didn't seem that people were pushing for it very hard (maybe that was >> a wrong assumption). Also I wasn't quite sure the transactional part >> not being functional anymore would be a deterrent. >> >>>>>>> The current non-correct implementation was just because it's expensive >>>>>>> to calculate correctly. I'm not sure the current impl is really that >>>>>>> useful for anything. >>>>>> +1 >>>>>> >>>>>> And not just size() but many others from ConcurrentMap. >>>>>> The question is if we should drop the interface and all the methods >>>>>> which aren't efficiently implementable, or fix all those methods. >>>>>> >>>>>> In the past I loved that I could inject "Infinispan superpowers" into >>>>>> an application making extensive use of Map and ConcurrentMap without >>>>>> changes, but that has been deceiving and required great care such as >>>>>> verifying that these features would not be used anywhere in the code. >>>>>> I ended up wrapping the Cache implementation in a custom adapter which >>>>>> would also implement ConcurrentMap but would throw a RuntimeException >>>>>> if any of the "unallowed" methods was called, at least I would detect >>>>>> violations safely. >>>>>> >>>>>> I still think that for the time being - until a better solution is >>>>>> planned - we should throw exceptions.. alas that's an old conversation >>>>>> and it was never done. >>>>>> >>>>>> Sanne >>>>>> >>>>>> >>>>>>> -Dennis >>>>>>> >>>>>>> On 10/03/2014 03:30 AM, Radim Vansa wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> recently we had a discussion about what size() returns, but I've >>>>>>>> realized there are more things that users would like to know. My >>>>>>>> question is whether you think that they would really appreciate it, or >>>>>>>> whether it's just my QA point of view where I sometimes compute the >>>>>>>> 'checksums' of cache to see if I didn't lost anything. >>>>>>>> >>>>>>>> There are those sizes: >>>>>>>> A) number of owned entries >>>>>>>> B) number of entries stored locally in memory >>>>>>>> C) number of entries stored in each local cache store >>>>>>>> D) number of entries stored in each shared cache store >>>>>>>> E) total number of entries in cache >>>>>>>> >>>>>>>> So far, we can get >>>>>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>>>>> E via distributed iterators / MR >>>>>>>> A via data container iteration + distribution manager query, but only >>>>>>>> without cache store >>>>>>>> C or D through >>>>>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>>>>> >>>>>>>> I think that it would go along with users' expectations if size() >>>>>>>> returned E and for the rest we should have special methods on >>>>>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>>>>> I'd say that finally to something that has firm meaning. >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> Radim >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev@lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev@lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> -- >>> Radim Vansa <rva...@redhat.com> >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa <rva...@redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev