On Tue, Oct 14, 2014 at 9:55 AM, Radim Vansa <[email protected]> wrote:
> On 10/13/2014 05:55 PM, Mircea Markus wrote: > > On Oct 13, 2014, at 14:06, Dan Berindei <[email protected]> wrote: > > > >> > >> On Fri, Oct 10, 2014 at 9:01 PM, Mircea Markus <[email protected]> > wrote: > >> > >> On Oct 10, 2014, at 17:30, William Burns <[email protected]> wrote: > >> > >>>>>>> Also we didn't really talk about the fact that these methods would > >>>>>>> ignore ongoing transactions and if that is a concern or not. > >>>>>>> > >>>>>> It might be a concern for the Hibernate 2LC impl, it was their TCK > that > >>>>>> prompted the last round of discussions about clear(). > >>>>> Although I wonder how much these methods are even used since they > only > >>>>> work for Local, Replication or Invalidation caches in their current > >>>>> state (and didn't even use loaders until 6.0). > >>>> > >>>> There is some more information about the test in the mailing list > discussion > >>>> [1] > >>>> There's also a JIRA for clear() [2] > >>>> > >>>> I think 2LC almost never uses distribution, so size() being local-only > >>>> didn't matter, but making it non-tx could cause problems - at least > for that > >>>> particular test. > >>> I had toyed around with the following idea before, but I never thought > >>> of it in the scope of the size method solely, but I have a solution > >>> that would work mostly for transactional caches. Essentially the size > >>> method would always operate in a READ_COMMITTED like state, using > >>> REPEATABLE_READ doesn't seem feasible since we can't keep all the > >>> contents in memory. Essentially the iterator would be ran and for > >>> each key that is found it checks the context to see if it is there. > >>> If the context entry is marked as removed it doesn't count the key, if > >>> the key is there it marks the key as found and counts it, and if it is > >>> not found it counts it. Then after iteration it finds all the keys in > >>> the context that were not found and also adds them to the count. This > >>> way it doesn't need to store additional memory (besides iteration > >>> costs) as all the context information is in memory. > >> sounds good to me. > >> > >> Mircea, you have to decide whether you want the precise estimation > using the entry iterator or the loose estimation using dataContainer.size() > :) > >> > >> I guess we can't make size() read everything into the invocation > context, so READ_COMMITTED is all we can provide if we want to keep size() > transactional. Maybe we don't really need it though... Will, could you > investigate the failing test that started the clear() thread [1] to see if > it really needs size() to be transactional? > > I'm okay with both approaches TBH, both are much better than what we > currently have. The accurate one is more costly but seems to be the > solution of choice so let's go for it. > > > >> > >>> My original thought was to also make the EntryIterator transactional > >>> in the same way which also means the keySet, entrySet and values > >>> methods could do the same things. The main reason stumbling block I > >>> had was the fact that the iterator and various collections returned > >>> could be used outside of the ongoing transaction which didn't seem to > >>> make much sense to me. But maybe these should be changed to be more > >>> like backing maps which HashMap, ConcurrentHashMap etc use for their > >>> methods, where instead it would pick up the transaction if there is > >>> one in the current thread and if there is no transaction just start an > >>> implicit one. > >> or if they are outside of a transaction to deny progress > >> > >> I don't think it's fair to require an explicit transaction for every > entrySet(). It should be possible to start an iteration without a > transaction, and only to invalidate an iteration started from an explicit > transaction the moment the transaction is committed/rolled back (although > it would complicate rules a bit). > >> > >> And what happens if the user writes to the cache while it's iterating > through the cache-backed collection? Should the user see the new entry in > the iteration, or not? I don't think you can figure out at the end of the > iteration which keys were included without keeping all the keys on the > originator. > > If the modification is done outside the iterator one might expect an > ConcurrentModificationException, as it is the case with some JDK iterators. > > -1 We're aiming at high performance cache with a lot of changes while > the operation is executed. This way, the iteration would never complete, > unless you explicitly switch the cache to read only mode (either through > Infinispan operation or in application). > I was referring only to changes made in the same transaction, not changes made by other transactions. But you make a good point, we can't throw a ConcurrentModificationException if the user for writes in the same transaction and ignore other transactions. > > I think that adding isCacheModified() or isTopologyChanged() to the > iterator would make sense, if that's not too complicated to implement. > Though, if we want non-disturbed iteration, snapshot isolation is the > only answer. > isCacheModified() is probably too costly to implement. isTopologyChanged() could be done, but I'm not sure what's the use case, as the entry iterator abstracts topology changes from the user. I don't think we want undisturbed iteration, at least not at this point. Personally, I just want to have a good story on why the iteration behaves in a certain way. By my standards, explaining that changes made by other transactions may completely/partially/not at all be visible in the iteration is fine, explaining that changes made by the same transaction may or may not be visible is not. Cheers Dan
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
