Re: [infinispan-dev] ISPN-3557: interactions between a clear() operation and a Transaction

Galder Zamarreño Wed, 16 Oct 2013 07:16:42 -0700

On Oct 16, 2013, at 3:11 PM, Sanne Grinovero <[email protected]> wrote:


> On 16 October 2013 12:21, Galder Zamarreño <[email protected]> wrote:
>> 
>> Guys, there's some context to this topic that Sanne has left out.
>> 
>> This discussion origins comes from Scott Marlow who's found a bug in a 
>> Hibernate 2LC test. The test itself can be found in [1].
>> 
>> Basically, the test loads a couple of entities into the 2LC, and then calls 
>> evictAll in the cache that evicts all entities from the 2LC. Then, it 
>> expects the size of the cache to be 0 and entities to be gone from the 2LC.
>> 
>> The 2LC has a peculiar way to deal with evictEntityRegions or evictAll, it 
>> doesn't just call clear() and hope for the best. Instead, it marks the cache 
>> as invalid, and next time someone queries the cache, it clears it in a 
>> separate transaction. This is done cluster wide too by sending messages to 
>> other node that the cache region is invalid.
>> 
>> The problem here is that because it's done in a separate transaction, it 
>> doesn't have an impact on the on going transaction, so the assertions after 
>> evict fail. IOW, we implemented evict as a "eventually clear", where 
>> eventually all transactions will stop using the cached data, but any current 
>> transactions will still see data. We did this for performance reasons and to 
>> avoids the issues Sanne mentiones below.
>> 
>> The problem is that we didn't envision this evict method to be used in such 
>> way. I always assumed this would be run outside of a transaction. IOW, you'd 
>> use this when you've updated the database separately (i.e. script, PL/SQL) 
>> and you want to clear the 2LC. I always thought that'd be something run via 
>> JMX or separate from another transaction.
> 
> That's indeed one of the use cases for a clear operation, but in this
> case - which I'd classify as maintenance/manual control - I don't
> think the user even expects a transaction to happen.
> It could also happen by explicit invocation of user code, but I think
> we're in the same use case.
> 
> 
>> But, as Scott has found out, the TCK has some particular expectations of 
>> what happens after a evictEntityRegions or evictAll call within a 
>> transaction :(
> 
> To clarify, that's not just a TCK weirdness that we need to "hack
> around" to pass it, it's an important use case, and as you say very
> different than the "manual" control.
> The scenario is that the user is invoking some batch update statement
> on the database which potentially changes a set of entities, whose ids
> might be unknown. To make a simple example, say the statement adds one
> day to the age of all users and you want such a statement to run on
> each midnight.
> Such statements exist and are quite common (probably within the scope
> of a better use case), and the point is that in such cases, and
> evictAll(users) need to happen, but it should happen in the scope of
> the transaction: isolation-wise, the current Session shouldn't be
> loading any more stale data - and IF it loads from the DB fresh data,
> this newly cached entries need to be confined to the current TX, and
> the changes need to be made visible to other nodes and Sessions only
> at commit: in other words, ideally you should roll back the eviction
> from happening if the statement is rolled back.
> 
> Still considering it's a cache, with "ideally" I mean that it would
> still be correct - although suboptimal - to evict all caches
> immediately (out of transaction): worse case other sessions will have
> slightly more cache misses, but they would still be correct.
> 
> So to summarise: Hibernate expects to be able to use it both in a
> transactional mode and in a non-transactional mode. Steve Ebersole
> pointed out that he would be open to discuss using two differently
> named methods to make the expectation clear, as it currently just
> relies on the fact that a transaction is active or not.
> 
> Dan has a great point about JCache, still we'd need to investigate
> what the spec is saying about transactionality of such a method, it
> would surprise me if it mandated transactionality?

^ Transactions have been left out of the spec [1].

> 
> I'll call myself out on how you're all thinking to implement it, as
> long as the requirements described above are clear: I think they make
> much sense for any general purpose cache and it's not just a weirdness
> of the TCK designer or Hibernate. So the trouble for us is that
> Infinispan no longer can operate on a cache mixing transactional and
> non-transactional operations. That looks like something that needs to
> be addressed, and considering the general use case for a clear I think
> it would suffice to clearly mark clear as a non-transactional
> operation.
> 
> BTW If I really need to clear it.. I don't care about it to be
> consistently applied "all or nothing", it would be much more of a sane
> decision to apply it at least where it worked fine. Being this an XA
> operation seems nonsense.
> 
> That said, in the way a distributed key value store which aims at
> bolting on eventual consistency & co, I would consider it damn
> dangerous to embark in designing operations whose scope is not
> restricted to a well defined set of keys. I tried to pitch my concern
> at that level rather than jumping in the details of how to make the
> TCK work. Conceptually, I think it's very tricky as you won't be able
> to track any form of causality between this operation and subsequent
> write operations. At some point you'll need to introduce tombstone
> markers, and you need a to define what is your "keyset universe" to
> implement such a method.

[1] https://groups.google.com/forum/?fromgroups=#!topic/jsr107/bEHJ5m10HT4

> 
> Cheers,
> Sanne
> 
>> 
>> With this in mind, let me comment on the specifics below:
>> 
>> [1] 
>> https://github.com/scottmarlow/wildfly/blob/b61dc1dddb1f70847d1d82206a366eedf6bc80db/testsuite/integration/basic/src/test/java/org/jboss/as/test/integration/jpa/secondlevelcache/SFSB2LC.java#L82
>> 
>> On Oct 2, 2013, at 1:03 AM, Sanne Grinovero <[email protected]> wrote:
>> 
>>> I'd love to brainstorm about the clear() operation and what it means
>>> on Infinispan.
>>> 
>>> I'm not sure to what extent, but it seems that clear() is designed to
>>> work in a TX, or even create an implicit transaction if needed, but
>>> I'm not understanding how that can work.
>>> 
>>> Obviously a clear() operation isn't listing all keys explicitly. Which
>>> implies that it's undefined on which keys it's going to operate when
>>> it's fired.. that seems like terribly wrong in a distributed key/value
>>> store as we can't possibly freeze the global state and somehow define
>>> a set of keys which are going to be affected, while an explicit
>>> enumeration is needed to acquire the needed locks.
>>> 
>>> It might give a nice safe feeling that, when invoking a clear()
>>> operation in a transaction, I can still abort the transaction to make
>>> it cancel the operation; that's the only good part I can think of: we
>>> can cancel it.
>>> 
>>> I don't think it has anything to do with consistency though? To make
>>> sure you're effectively involving all replicas of all entries in a
>>> consistent way, a lock would need to be acquired on each affected key,
>>> which again implies a need to enumerate all keys, including the
>>> unknown keys which might be hiding in a CacheStore: it's not enough to
>>> broadcast the clear() operation to all nodes and have them simply wipe
>>> their local state as that's never going to deal correctly
>>> (consistently) with in-flight transactions working on different nodes
>>> at different times (I guess enabling Total Order could help but you'd
>>> need to make it mandatory).
>>> 
>>> So let's step back a second and consider what is the use case for
>>> clear() ? I suspect it's primarily a method needed during testing, or
>>> maybe cleanup before a backup is restored (operations), maybe a
>>> manually activated JMX operation to clear the cache in exceptional
>>> cases.
>> 
>> ^ Well, that's precisely the problem of second level cache 
>> evictEntityRegions or evictAll. In some scenarios, the clear is expected to 
>> affect the ongoing transaction too, see above use case from the TCK.
>> 
>>> I don't think there would ever be a need for a clear() operation to
>>> interact with other transactions, so I'd rather make it illegal to
>>> invoke a clear() inside a transaction,
>> 
>> -1, unless you have other ideas to deal with evict calls that I mentioned 
>> above that satisfy what the TCK is after.
>> 
>>> or simply ignore the
>>> transactional scope and have an immediate and distributed effect.
>> 
>> That'd be more inline with what's needed above.
>> 
>>> I'm likely missing something. What terrible consequences would this have?
>> 
>> p.s. For the 2LC use case, I'm trying to see if the clear() could be done in 
>> the same transaction and see if that helps, but no luck yet as that causes 
>> other failures.
>> 
>>> 
>>> Cheers,
>>> Sanne
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [email protected]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> --
>> Galder Zamarreño
>> [email protected]
>> twitter.com/galderz
>> 
>> Project Lead, Escalante
>> http://escalante.io
>> 
>> Engineer, Infinispan
>> http://infinispan.org
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarreño
[email protected]
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org


_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-3557: interactions between a clear() operation and a Transaction

Reply via email to