The various bulk operations for the Map interface are now implemented to utilize the entire cluster in the new Infinispan 7.0 Release [1].
Please find more information about the bulk operations at [2] [1] http://blog.infinispan.org/2014/11/infinispan-700final-is-out.html [2] http://blog.infinispan.org/2014/11/why-doesnt-mapsize-return-size-of.html On Tue, Oct 14, 2014 at 8:11 AM, William Burns <[email protected]> wrote: > On Tue, Oct 14, 2014 at 3:33 AM, Dan Berindei <[email protected]> wrote: >> >> >> On Tue, Oct 14, 2014 at 9:55 AM, Radim Vansa <[email protected]> wrote: >>> >>> On 10/13/2014 05:55 PM, Mircea Markus wrote: >>> > On Oct 13, 2014, at 14:06, Dan Berindei <[email protected]> wrote: >>> > >>> >> >>> >> On Fri, Oct 10, 2014 at 9:01 PM, Mircea Markus <[email protected]> >>> >> wrote: >>> >> >>> >> On Oct 10, 2014, at 17:30, William Burns <[email protected]> wrote: >>> >> >>> >>>>>>> Also we didn't really talk about the fact that these methods would >>> >>>>>>> ignore ongoing transactions and if that is a concern or not. >>> >>>>>>> >>> >>>>>> It might be a concern for the Hibernate 2LC impl, it was their TCK >>> >>>>>> that >>> >>>>>> prompted the last round of discussions about clear(). >>> >>>>> Although I wonder how much these methods are even used since they >>> >>>>> only >>> >>>>> work for Local, Replication or Invalidation caches in their current >>> >>>>> state (and didn't even use loaders until 6.0). >>> >>>> >>> >>>> There is some more information about the test in the mailing list >>> >>>> discussion >>> >>>> [1] >>> >>>> There's also a JIRA for clear() [2] >>> >>>> >>> >>>> I think 2LC almost never uses distribution, so size() being >>> >>>> local-only >>> >>>> didn't matter, but making it non-tx could cause problems - at least >>> >>>> for that >>> >>>> particular test. >>> >>> I had toyed around with the following idea before, but I never thought >>> >>> of it in the scope of the size method solely, but I have a solution >>> >>> that would work mostly for transactional caches. Essentially the size >>> >>> method would always operate in a READ_COMMITTED like state, using >>> >>> REPEATABLE_READ doesn't seem feasible since we can't keep all the >>> >>> contents in memory. Essentially the iterator would be ran and for >>> >>> each key that is found it checks the context to see if it is there. >>> >>> If the context entry is marked as removed it doesn't count the key, if >>> >>> the key is there it marks the key as found and counts it, and if it is >>> >>> not found it counts it. Then after iteration it finds all the keys in >>> >>> the context that were not found and also adds them to the count. This >>> >>> way it doesn't need to store additional memory (besides iteration >>> >>> costs) as all the context information is in memory. >>> >> sounds good to me. >>> >> >>> >> Mircea, you have to decide whether you want the precise estimation >>> >> using the entry iterator or the loose estimation using >>> >> dataContainer.size() >>> >> :) >>> >> >>> >> I guess we can't make size() read everything into the invocation >>> >> context, so READ_COMMITTED is all we can provide if we want to keep >>> >> size() >>> >> transactional. Maybe we don't really need it though... Will, could you >>> >> investigate the failing test that started the clear() thread [1] to see >>> >> if >>> >> it really needs size() to be transactional? >>> > I'm okay with both approaches TBH, both are much better than what we >>> > currently have. The accurate one is more costly but seems to be the >>> > solution >>> > of choice so let's go for it. >>> > >>> >> >>> >>> My original thought was to also make the EntryIterator transactional >>> >>> in the same way which also means the keySet, entrySet and values >>> >>> methods could do the same things. The main reason stumbling block I >>> >>> had was the fact that the iterator and various collections returned >>> >>> could be used outside of the ongoing transaction which didn't seem to >>> >>> make much sense to me. But maybe these should be changed to be more >>> >>> like backing maps which HashMap, ConcurrentHashMap etc use for their >>> >>> methods, where instead it would pick up the transaction if there is >>> >>> one in the current thread and if there is no transaction just start an >>> >>> implicit one. >>> >> or if they are outside of a transaction to deny progress >>> >> >>> >> I don't think it's fair to require an explicit transaction for every >>> >> entrySet(). It should be possible to start an iteration without a >>> >> transaction, and only to invalidate an iteration started from an explicit >>> >> transaction the moment the transaction is committed/rolled back >>> >> (although it >>> >> would complicate rules a bit). >>> >> >>> >> And what happens if the user writes to the cache while it's iterating >>> >> through the cache-backed collection? Should the user see the new entry in >>> >> the iteration, or not? I don't think you can figure out at the end of the >>> >> iteration which keys were included without keeping all the keys on the >>> >> originator. >>> > If the modification is done outside the iterator one might expect an >>> > ConcurrentModificationException, as it is the case with some JDK >>> > iterators. >>> >>> -1 We're aiming at high performance cache with a lot of changes while >>> the operation is executed. This way, the iteration would never complete, >>> unless you explicitly switch the cache to read only mode (either through >>> Infinispan operation or in application). >> >> >> I was referring only to changes made in the same transaction, not changes >> made by other transactions. But you make a good point, we can't throw a >> ConcurrentModificationException if the user for writes in the same >> transaction and ignore other transactions. >> >>> >>> >>> I think that adding isCacheModified() or isTopologyChanged() to the >>> iterator would make sense, if that's not too complicated to implement. >>> Though, if we want non-disturbed iteration, snapshot isolation is the >>> only answer. >> >> >> isCacheModified() is probably too costly to implement. >> isTopologyChanged() could be done, but I'm not sure what's the use case, as >> the entry iterator abstracts topology changes from the user. >> >> I don't think we want undisturbed iteration, at least not at this point. >> Personally, I just want to have a good story on why the iteration behaves in >> a certain way. By my standards, explaining that changes made by other >> transactions may completely/partially/not at all be visible in the iteration >> is fine, explaining that changes made by the same transaction may or may not >> be visible is not. > > Sorry I didn't respond earlier. But these commands would check the > transaction context before returning the value to the user. This > requires a user interaction for this to occur, so we can guarantee > they will always see their updated value if they have one in the > transaction (even if one is ran in between the iteration). The big > thing is whether or not another transaction's update is seen when we > don't have an update for that key (that will be occur if the segment > is completed before the update or not). > > There should be no need to tell if the cache modified or the topology > changed (the former would require a very high impact for performance > to tell with a DIST cache). > > To be honest the wrapper classes would just be delegating to the Cache > for the vast majority of operations (get, remove, contains etc.). It > would only be when someone specifically uses the iterator on the > various collections that the distributed iterator would even be used. > This way the various collections would be backing maps, like HashMap > and ConcurrentHashMap have, just they have to check the transaction as > well. The values collection would be extremely limited in its > supported methods though, pretty much only to iteration and size. > >> >> >> Cheers >> Dan >> >> >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
