On 11/25/2015 03:24 PM, Pedro Ruivo wrote: > > On 11/25/2015 01:20 PM, Radim Vansa wrote: >> On 11/25/2015 12:07 PM, Sanne Grinovero wrote: >>> On 25 November 2015 at 10:48, Pedro Ruivo <[email protected]> wrote: >>>>> An alternative is to wait for all ACKs, but I think this could still >>>>> be optimised in "triangle shape" too by having the Originator only >>>>> wait for the ACKs from the non-primary replicas? >>>>> So backup owners have to send a confirmation message to the >>>>> Originator, while the Primary owner isn't expecting to do so. >>>> IMO, we should wait for all ACKs to keep our read design. >> What exactly is our 'read design'? > If we don't wait for all the ACKs, then we have to go to the primary > owner for reads, even if the originator is a Backup owner.
I don't think so, but we probably have som miscom. If O = B, we still wait for reply from B (which is local) which is triggered by receiving an update from P (after applying the change locally). So it goes OB(application thread) [cache.put()] -(unordered)-> P(worker thread) [applies update] -(ordered)-> OB(worker thread) [applies update] -(in-VM)-> OB(application thread) [continues] > >> I think that the source of optimization is that once primary decides to >> backup the operation, he can forget about it and unlock the entry. So, >> we don't need any ACK from primary unless it's an exception/noop >> notification (as with conditional ops). If primary waited for ACK from >> backup, we wouldn't save anything. > About the iteration between P -> B, you're right. We don't need to wait > for the ACKs if the messages are sent in FIFO (and JGroups guarantee that) > > About the O -> P, IMO, the Originator should wait for the reply from > Backup. I was never claiming otherwise, O always needs to wait for ACK from Bs - only then it can successfully report that value has been written on all owners. What does this have to do with O -> P? > At least, the Primary would be the only one who needs to return > the previous value (if needed) and it can return if the operation > succeed or not. Simple success: no P -> O, B -> O (success) Simple failure/non-modifying operation (as with putIfAbsent/functional call): P -> O (failure/custom value), no B -> O previous/custom value (as with replace() or functional call): P -> O (previous/custom value), B -> O (success); alternative is P -> B (previous/custom value, new value) and B -> O (previous/custom value) Exception on either P or B: send the exception to O Lost/timed out P -> B: O times out waiting for ack from B, throws exception > This way, it would avoid forking the code for each type > of command without any benefit (I'm thinking sending the reply to > originator in parallel with the update message to the backups). What forking of code for each type do you mean? I see that there are two branches whether the command is going to be replicated to B or not. Radim > >> The gains are: >> * less hops (3 instead of 4 if O != P && O != B) >> * less messages (primary ACK is transitive based on ack from B) >> * shorter lock times (not locking during P -> B RPC) >> >>>> However, the >>>> Originator needs to wait for the ACK from Primary because of conditional >>>> operations and functional API. >>> If the operation is successful, Primary will have to let the >>> secondaries know so these can reply to the Originator directly: still >>> saves an hop. > As I said above: "I'm thinking sending the reply to originator in > parallel with the update message to the backups" > >>>> In this first case, if the conditional operation fail, the Backups are >>>> not bothered. The latter case, we may need the return value from the >>>> function. >>> Right, for a failed or rejected operation the secondaries won't even >>> know about it, >>> so the Primary is in charge of letting the Originator know. >>> Essentially you're highlighting that the Originator needs to wait for >>> either the response from secondaries (all of them?) >>> or from the Primary. >>> >>>>> I suspect the tricky part is what happens when the Primary owner rules >>>>> +1 to apply the change, but then the backup owners (all or some of >>>>> them) somehow fail before letting the Originator know. The Originator >>>>> in this case should seek confirmation about its operation state >>>>> (success?) with the Primary owner; this implies that the Primary owner >>>>> needs to keep track of what it's applied and track failures too, and >>>>> this log needs to be pruned. >> Currently, in case of lost (timed out) ACK from B to P, we just report >> exception and don't care about synchronizing P and B - B can already >> store updated value. >> So we don't have to care about rollback on P if replication to B fails >> either - we just report that it's broken, sorry. >> Better consolidation API would be nice, though, something like >> cache.getAllVersions(). >> >> Radim >> >> >>>>> Sounds pretty nice, or am I missing other difficulties? >>>>> >>>>> Thanks, >>>>> Sanne >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> [email protected] >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <[email protected]> JBoss Performance Team _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
