Re: [infinispan-dev] Reliability of return values

Galder Zamarreño Tue, 17 Jun 2014 02:54:08 -0700

On 16 Jun 2014, at 16:57, Dan Berindei <[email protected]> wrote:


> On Thu, Jun 12, 2014 at 4:54 PM, Galder Zamarreño <[email protected]> wrote:
> Hi all,
> 
> I’m working on the implementation of this, and the solution noted in the JIRA 
> does not work for situations where you have to return a previous value that 
> might have been overriden due to partial operation application. Example 
> (assuming 1 owner only):
> 
> 1. remote client does a put(“key”, 1) in Node-A
> 2. remote client does a replace(“key”, 2) on Node-A but the operation fails 
> to replicate and gets partially applieed in Node-A only.
> 3. remote client, seeing the replace failed, retries the replace(“key”, 2) in 
> Node-B. replace underneath finds that the previous value of “key” is 2 (since 
> it got partially applied in Node-A), so it succeeds but the previous value 
> returned is 2, which is wrong. The previous value should have been 1, but 
> this value is gone…
> 
> In my view, there is two ways to solve this issue:
> 
> 1. Make Hot Rod caches transactional. By doign so, operations are not 
> partially applied. They’re done fully cluster wide or they’re rolled back. 
> I’ve verified this and the test passes once you make the cache transactional. 
> The downside is of course performance. IOW, anyone using conditional 
> operations, or relying on previous values, would need transactional caches. 
> This should work well with the retry mechanism being implemented for 
> ISPN-2956, which is still needed. A big question here is whether Hot Rod 
> caches should be transactional by default or viceversa. If they’re 
> transactional, our performance will go down for sure but we won’t have this 
> type of issues (with retry in place). If you’re not transactional, you are 
> faster but you’re exposed to these edge cases, and you need to consider them 
> when deploying your app, something people might miss, although we could 
> provide WARN messages when conditional operations or Flag.FORCE_RETURN_VALUE 
> is used with non-transactional caches.
> 
> The same problem [1] can appear in embedded caches. So if there's an argument 
> to make HotRod caches transactional by default, it should apply to embedded 
> caches as well.
> 
> I like the idea of logging a warning when the FORCE_RETURN_VALUE or the 
> non-versioned conditional operations are used from HotRod. But we have to be 
> careful with the wording, because inconsistencies can appear in transactional 
> mode as well [2].

I think this (default non-transaction, and WARN if using such methods with 
non-transactional cache) might be the best stop gap solution.

> [1] https://issues.jboss.org/browse/ISPN-4286
> [2] https://issues.jboss.org/browse/ISPN-3421

Have you considered other fixes for [2]?

> 
> 
> 2. Get rid of returning previous value in the Hot Rod protocol for modifying 
> operations. For conditional operations, returning true/false is at least 
> enough to see if the condition was applied. So, 
> replaceIfUnmodified/replace/remove(conditional), they would only return 
> true/false. This would be complicated due to reliance on Map/ConcurrentMap 
> APIs. Maybe something to consider for when we stop relying on JDK APIs.
> 
> 
> I'm not sure we can completely get rid of the return values, even though 
> JCache doesn't extend Map it still has a getAndPut method.

It does have such method but we could potentially make it throw an exception 
saying that is not supported.

> 
>  
> I also considered applying corrective actions but that’s very messy and prone 
> to concurrency issues, so I quickly discarded that.
> 
> Yeah, rolling back partial modifications is definitely not an option.
>  
> 
> Any other options? Thoughts on the options above?
> 
> I was waiting for Sanne to say this, but how about keeping a version history?
> 
> If we had the chain of values/versions for each key, we could look up the 
> version of the current operation in the chain when retrying. If it's there, 
> we could infer that the operation was already applied, and return the 
> previous value in the chain as the previous version.
> 
> Of course, there are the usual downsides:
> 1. Maintaining the history might be tricky, especially around state transfers.
> 2. Performance will also be affected, maybe getting closer to the tx 
> performance.

Yeah, keeping version history could help, but it’d be quite a beast to 
implement for 7.0, including garbage collection of old versions...etc.

>  
> 
> Cheers,
> 
> On 26 May 2014, at 18:11, Galder Zamarreño <[email protected]> wrote:
> 
> > Hi all,
> >
> > I’ve been looking into ISPN-2956 last week and I think we have a solution 
> > for it which requires a protocol change [1]
> >
> > Since we’re in the middle of the Hot Rod 2.0 development, this is a good 
> > opportunity to implement it.
> >
> > Cheers,
> >
> > [1] 
> > https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12970541
> >
> > On 14 May 2014, at 09:36, Dan Berindei <[email protected]> wrote:
> >
> >>
> >>
> >>
> >> On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <[email protected]> wrote:
> >> On 05/13/2014 03:58 PM, Dan Berindei wrote:
> >>>
> >>>
> >>> On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <[email protected]> wrote:
> >>> @Dan: It's absolutely correct to do the further writes in order to make
> >>> the cache consistent, I am not arguing against that. You've fixed the
> >>> outcome (state of cache) well. My point was that we should let the user
> >>> know that the value he gets is not 100% correct when we already know
> >>> that - and given the API, the only option to do that seems to me as
> >>> throwing an exception.
> >>>
> >>> The problem, as I see it, is that users also expect methods that throw an 
> >>> exception to *not* modify the cache.
> >>> So we would break some of the users' expectations anyway.
> >>
> >> When the response from primary owner does not arrive soon, we throw 
> >> timeout exception and the cache is modified anyway, isn't it?
> >> If we throw ~ReturnValueUnreliableException, the user has at least some 
> >> chance to react. Currently, for code requiring 100% reliable value, you 
> >> can't do anything but ignore the return value, even for CAS operations.
> >>
> >>
> >> Yes, but we don't expect the user to handle a TimeoutException in any 
> >> meaningful way. Instead, we expect the user to choose his hardware and 
> >> configuration to avoid timeouts, if he cares about consistency. How could 
> >> you handle an exception that tells you "I may have written the value you 
> >> asked me to in the cache, or maybe not. Either way, you will never know 
> >> what the previous value was. Muahahaha!" in an application that cares 
> >> about consistency?
> >>
> >> But the proposed ReturnValueUnreliableException can't be avoided by the 
> >> user, it has to be handled every time the cluster membership changes. So 
> >> it would be more like WriteSkewException than TimeoutException. And when 
> >> we throw a WriteSkewException, we don't write anything to the cache.
> >>
> >> Remember, most users do not care about the previous value at all - that's 
> >> the reason why JCache and our HotRod client don't return the previous 
> >> value by default. Those that do care about the previous value, use the 
> >> conditional write operations, and those already work (well, except for the 
> >> scenario below). So you would force everyone to handle an exception that 
> >> they don't care about.
> >>
> >> It would make sense to throw an exception if we didn't return the previous 
> >> value by default, and the user requested the return value explicitly. But 
> >> we do return the value by default, so I don't think it would be a good 
> >> idea for us.
> >>
> >>>
> >>>
> >>> @Sanne: I was not suggesting that for now - sure, value versioning is (I
> >>> hope) on the roadmap. But that's more complicated, I though just about
> >>> making an adjustment to the current implementation.
> >>>
> >>>
> >>> Actually, just keeping a history of values would not fix the the return 
> >>> value in all cases.
> >>>
> >>> When retrying a put on the new primary owner, the primary owner would 
> >>> still have to compare our value with the latest value, and return the 
> >>> previous value if they are equal. So we could have something like this:
> >>>
> >>> A is the originator, B is the primary owner, k = v0
> >>> A -> B: put(k, v1)
> >>> B dies before writing v, C is now primary owner
> >>> D -> C: put(k, v1) // another put operation from D, with the same value
> >>> C -> D: null
> >>> A -> C: retry_put(k, v1)
> >>> C -> A: v0 // C assumes A is overwriting its own value, so it's returning 
> >>> the previous one
> >>>
> >>> To fix that, we'd need a unique version generated by the originator - 
> >>> kind of like a transaction id ;)
> >>
> >> Is it such a problem to associate unique ID with each write? History 
> >> implementation seems to me like the more complicated part.
> >>
> >> I also think maintaining a version history would be quite complicated, and 
> >> it also would make it harder for users to estimate their cache's memory 
> >> usage. That's why I was trying to show that it's not a panacea.
> >>
> >>
> >>
> >>> And to fix the HotRod use case, the HotRod client would have to be the 
> >>> one generating the version.
> >>
> >> I agree.
> >>
> >> Radim
> >>
> >>
> >>>
> >>> Cheers
> >>> Dan
> >>>
> >>>
> >>>
> >>> Radim
> >>>
> >>> On 05/12/2014 12:02 PM, Sanne Grinovero wrote:
> >>>> I don't think we are in a position to decide what is a reasonable
> >>>> compromise; we can do better.
> >>>> For example - as Radim suggested - it might seem reasonable to have
> >>>> the older value around for a little while. We'll need a little bit of
> >>>> history of values and tombstones anyway for many other reasons.
> >>>>
> >>>>
> >>>> Sanne
> >>>>
> >>>> On 12 May 2014 09:37, Dan Berindei <[email protected]> wrote:
> >>>>> Radim, I would contend that the first and foremost guarantee that put()
> >>>>> makes is to leave the cache in a consistent state. So we can't just 
> >>>>> throw an
> >>>>> exception and give up, leaving k=v on one owner and k=null on another.
> >>>>>
> >>>>> Secondly, put(k, v) being atomic means that it either succeeds, it 
> >>>>> writes
> >>>>> k=v in the cache, and it returns the previous value, or it doesn't 
> >>>>> succeed,
> >>>>> and it doesn't write k=v in the cache. Returning the wrong previous 
> >>>>> value is
> >>>>> bad, but leaving k=v in the cache is just as bad, even if the all the 
> >>>>> owners
> >>>>> have the same value.
> >>>>>
> >>>>> And last, we can't have one node seeing k=null, then k=v, then k=null 
> >>>>> again,
> >>>>> when the only write we did on the cache was a put(k, v). So trying to 
> >>>>> undo
> >>>>> the write would not help.
> >>>>>
> >>>>> In the end, we have to make a compromise, and I think returning the 
> >>>>> wrong
> >>>>> value in some of the cases is a reasonable compromise. Of course, we 
> >>>>> should
> >>>>> document that :)
> >>>>>
> >>>>> I also believe ISPN-2956 could be fixed so that HotRod behaves just like
> >>>>> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the 
> >>>>> HotRod
> >>>>> protocol and to the cache itself.
> >>>>>
> >>>>> Incidentally, transactional caches have a similar problem when the
> >>>>> originator leaves the cluster: ISPN-3421 [1]
> >>>>> And we can't handle transactional caches any better than 
> >>>>> non-transactional
> >>>>> caches until we expose transactions to the HotRod client.
> >>>>>
> >>>>> [1] https://issues.jboss.org/browse/ISPN-2956
> >>>>>
> >>>>> Cheers
> >>>>> Dan
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <[email protected]> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> recently I've stumbled upon one already expected behaviour (one 
> >>>>>> instance
> >>>>>> is [1]), but which did not got much attention.
> >>>>>>
> >>>>>> In non-tx cache, when the primary owner fails after the request has 
> >>>>>> been
> >>>>>> replicated to backup owner, the request is retried in the new topology.
> >>>>>> Then, the operation is executed on the new primary (the previous
> >>>>>> backup). The outcome has been already fixed in [2], but the return 
> >>>>>> value
> >>>>>> may be wrong. For example, when we do a put, the return value for the
> >>>>>> second attempt will be the currently inserted value (although the entry
> >>>>>> was just created). Same situation may happen for other operations.
> >>>>>>
> >>>>>> Currently, it's not possible to return the correct value (because it 
> >>>>>> has
> >>>>>> already been overwritten and we don't keep a history of values), but
> >>>>>> shouldn't we rather throw an exception if we were not able to fulfil 
> >>>>>> the
> >>>>>> API contract?
> >>>>>>
> >>>>>> Radim
> >>>>>>
> >>>>>> [1] https://issues.jboss.org/browse/ISPN-2956
> >>>>>> [2] https://issues.jboss.org/browse/ISPN-3422
> >>>>>>
> >>>>>> --
> >>>>>> Radim Vansa <[email protected]>
> >>>>>> JBoss DataGrid QA
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> infinispan-dev mailing list
> >>>>>> [email protected]
> >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> infinispan-dev mailing list
> >>>>> [email protected]
> >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>> _______________________________________________
> >>>> infinispan-dev mailing list
> >>>> [email protected]
> >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>
> >>>
> >>> --
> >>> Radim Vansa <[email protected]>
> >>> JBoss DataGrid QA
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> [email protected]
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>>
> >>> [email protected]
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >>
> >> --
> >> Radim Vansa
> >> <[email protected]>
> >>
> >> JBoss DataGrid QA
> >>
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> [email protected]
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> [email protected]
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Galder Zamarreño
> > [email protected]
> > twitter.com/galderz
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > [email protected]
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> --
> Galder Zamarreño
> [email protected]
> twitter.com/galderz
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


--
Galder Zamarreño
[email protected]
twitter.com/galderz


_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Reliability of return values

Reply via email to