Just to finish up the discussion, at least from my side, I will answer one by one:
@Christian Rivasseau I agree that this is not the correct forum but as I wasn't the one that opened the issue I found it really handy to re-ask in a forum where lots of expert people could expose their own experiences. So sorry about that! @Sankate Sharma I don't think that an ad-hoc application level or service mesh (linkerd, istio or even a pure envoy one) based retry logic is a solution because you're just delaying or making the problem less probable, but not solving it. @Robert Engels Thanks for your ideas and links. IMHO and after reading a lot about this I would do the next in the following order: * Avoid distributed transactions and try to fit multiple entities in a transaction into an existing bounded contexts * Saga patterns + eventual consistency with compensation/reconciliation in case of failures * CDC for data replication in order to feed other systems.. I found this article really interesting and a good summary on the current patterns: https://ebaytech.berlin/data-consistency-in-microservices-architecture-bf99ba31636f. Regards, On Monday, November 5, 2018 at 8:40:21 PM UTC+1, robert engels wrote: > > Not sure if it is the OP, but given the order service example cited, a > fairly simple solution that isn’t transactional, would be: > > use service that hands-out globally unique txIDs. > > 1) use audit service to log “handling order for txID” > 2) use order service to commit order, attaching TxID to the Order > 3) use audit service to log “handled order for txID”. > > Then it is easy to determine what happened. > > If #3 exists, you know the order is committed. > else if #1 exists, then either: > A) the order was committed, but the audit logged failed > - so ensure order exists with TxID and re-log order audit > B) the order failed, so log another order event “handled order TxID, > failed” > > The above actions would be performed in cases where the audit log was > “incomplete”. > > On Nov 5, 2018, at 1:27 PM, 'Carl Mastrangelo' via grpc.io < > [email protected] <javascript:>> wrote: > > <Speaking on my own behalf, rather than Google's> > > I think OP hit the nail on the head with REST being a bad fit for > transactions. A previous team I worked on had pretty much the same > problem. There are two solutions to having transaction like semantics. > > 1. Make all RPCs compareAndSwap-like, effectively making them atomic > operations. When querying an object from the service, every object needs a > unique version (something like a timestamp). When making updates, the > system compares the modified time on the request object with the one it has > stored and makes sure there haven't been any changes. This works for > objects that are updated infrequently, and which don't involve other > dependent objects. > > 2. Make streaming RPCs a transaction. One good thing about streaming > RPCs is that you make the messages you send and receive effective a > consistent snapshot. When you half-close the streaming RPC, it attempts to > commit the transaction as a whole, or else gives a failure to try again. > This makes multi object updates much easier. The down side is that the API > is uglier, because effectively you have a single "Transaction RPC" and all > your actual calls are just submessages. It works, but things like stats, > auth, interception, etc. get more complicated. > > > Personally, I would structure my data to prefer option one, even though it > is less powerful. I *really* don't like thinking about implementing my own > deadlock detection or other lock ordering for a RPC service. If you know > locking is not a problem, I think both are valid solutions. > > > > On Monday, November 5, 2018 at 1:16:10 AM UTC-8, [email protected] wrote: >> >> Dead issue but I would like to resurrect it because this wasn't answered >> at all. >> >> Simple use case which can easily illustrate the problem: Two different >> services OrderService (with CreateOrder method) and AuditService (with >> Audit method). You want to create the order and, in case everything >> succeeded, log an audit entry. If you log an entry beforehand you could end >> with an audit log which never happened because the create order task >> failed. If you (try to) log an entry afterwards, the audit task could fail >> and end not logging something that happened which fails its sole purpose of >> having an audit log at all. >> >> What do you guys at Google do? >> * Compensate? >> * Nothing more than live with it? >> * In this concrete case having a custom audit log per service and the CDC >> (Change Data Capture) and replicate to the central service? >> >> @Jiri what did you end up doing? >> >> Thanks, >> >> >> On Wednesday, September 9, 2015 at 7:47:51 PM UTC+2, Jorge Canizales >> wrote: >>> >>> For Google's JSON/REST APIs we use ETag headers (optimistic concurrency) >>> to do these things. That's something easy to implement on top of gRPC, >>> using the request and response metadata to send the equivalent headers. >>> >>> On Wednesday, August 5, 2015 at 1:45:53 AM UTC-7, Jiri Jetmar wrote: >>>> >>>> Hi guys, >>>> >>>> we are (re-) designing a new RPC-based approach for our backoffice >>>> services and we are considering the usage of gRPC. Currently we are using >>>> a >>>> REST method to call our services, but we realize with time to design a >>>> nice >>>> REST API is a really hard job and when we look to our internal APIs it >>>> looks more RPC then REST. Therefore the shift to pure RPC is valid >>>> alternative. I;m not talking here about public APIs - they will continue >>>> to >>>> be REST-based.. >>>> >>>> Now, when there are a number of microservices that are/can be >>>> distributed one has to compensate issues during commands (write >>>> interactions, aka HTTP POST, PUT, DELETE). Currently we are using the TCC >>>> (try-confirm-cancel) pattern. >>>> >>>> I'm curious how you guys at Google are solving it ? How you are solving >>>> the issue with distributed transaction on top of the RPC services ? Are >>>> you >>>> doing to solve it on a more technical level (e.g. a kind of transactional >>>> monitor), or are you considering it more on a functional/application level >>>> where the calling client has to compensate failed commands to a service ? >>>> >>>> Are the any plans to propose something for gRPC.io ? >>>> >>>> Thank you. >>>> >>>> Cheers, >>>> Jiri >>>> >>> > -- > You received this message because you are subscribed to the Google Groups " > grpc.io" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] <javascript:> > . > Visit this group at https://groups.google.com/group/grpc-io. > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/97e93eda-97e0-4a88-8c57-66b62a0c9abf%40googlegroups.com > > <https://groups.google.com/d/msgid/grpc-io/97e93eda-97e0-4a88-8c57-66b62a0c9abf%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/b4a0bca1-880a-40fb-b5ee-6e5b04a4a320%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
