(+dgq) I think this is actually a question to be addressed in the load-balancing affinity design, which David is working on. I suspect that the main thing we need to do is to expose the request metadata that indicates that a request is a retry to the LB policy, so that it can use that information to make its decision. Then it's up to the LB policy to notice that a request is a retry and apply any necessary logic for that case.
On Sun, Feb 12, 2017 at 7:26 PM, <[email protected]> wrote: > > We are not supporting explicit load balancing constraints for retries. > The retry attempt or hedged RPC will be re-resolved through the > load-balancer, so it's up to the service owner to ensure that this has a > low-likelihood of issuing the request to the same backend. > > That seems fairly difficult for any service with request-dependent routing > semantics. Lets use a DFS as an example: many DFSes maintain N replicas of > a given file block. In the case where you send a hedged request for a > block, your likelihood is 1/N of requerying the same DFS node which might > well have a slow disk. At least for us using HDFS, N=3 most of the time; a > therefore 33% chance of requerying the same node. Even assuming a smart > load balancing service which intelligently removes poorly performing > storage nodes from service, it still seems desirable to ensure hedged > requests go to a different node. Not having a story for more informed load > balancing seems like it makes a lot of use cases more difficult than they > need to be. > > Regards, > Michael > > On Sunday, February 12, 2017 at 7:24:59 PM UTC-7, Eric Gribkoff wrote: >> >> Hi Michael, >> >> Thanks for the feedback. Responses to your questions (and Josh's >> follow-up question on retry backoff times) are inline below. >> >> On Sat, Feb 11, 2017 at 1:57 PM, 'Michael Rose' via grpc.io < >> [email protected]> wrote: >> >>> A few questions: >>> >>> 1) Under this design, is it possible to add a load balancing constraints >>> for retried/hedged requests? Especially during hedging, I'd like to be able >>> to try a different server since the original server might be garbage >>> collecting or have otherwise collected a queue of requests such that a >>> retry/hedge to this server will not be very useful. Or, perhaps the key I'm >>> looking up lives on a specific subset of storage servers and therefore >>> should be balanced to that specific subset. While that's the domain of a LB >>> policy, what information will hedging/retries provide to the LB policy? >>> >>> >> We are not supporting explicit load balancing constraints for retries. >> The retry attempt or hedged RPC will be re-resolved through the >> load-balancer, so it's up to the service owner to ensure that this has a >> low-likelihood of issuing the request to the same backend. This is part of >> a decision to keep the retry design as simple as possible while satisfying >> the majority of use cases. If your load-balancing policy has a high >> likelihood of sending requests to the same server each time, hedging (and >> to some extent retries) will be less useful regardless. There will be >> metadata attached to the call indicating that it's a retry, but it won't >> include information about which servers the previous requests went to. >> >> >> >>> 2) "Clients cannot override retry policy set by the service config." -- >>> is this intended for inside Google? How about gRPC users outside of Google >>> which don't use the DNS mechanism to push configuration? It seems like >>> having a client override for retry/hedging policy is pragmatic. >>> >>> >> In general, we don't want to support client specification of retry >> policies. The necessary information about what methods are safe to retry or >> hedge, the potential for increased load, etc., are really decisions that >> should be left to the service owner. The retry policy will definitely be a >> part of the service config. While there are still some security-related >> discussions about the exact delivery mechanism for the service config and >> retry policies, I think your concern here should be part of the service >> config design discussion rather than something specific to retry support. >> >> >>> 3) Retry backoff time -- if I'm reading it right, it will always retry >>> in random(0, current_backoff) milliseconds. What's your feeling on this vs. >>> a retry w/ configurable jitter parameter (e.x. linear 1000ms increase w/ >>> 10% jitter). Is it OK if there's no minimum backoff? >>> >>> >> You are reading the backoff time correctly. There are a number of ways of >> doing this, (see https://www.awsarchitectureblog.com/2015/03/backoff.html) >> but choosing between random(0, current_backoff) is done intentionally and >> should generally give the best results. We do not want a configurable >> "jitter" parameter. Empirically, the retries should have more varied >> backoff time, and we also do not want to let service owners specify very >> low values for jitter (e.g., 1% or even 0), as this would cluster all >> retries tightly together and further contribute to server overloading. >> >> Best, >> >> Eric Gribkoff >> >> >> Regards, >>> Michael >>> >>> On Friday, February 10, 2017 at 5:31:01 PM UTC-7, [email protected] >>> wrote: >>>> >>>> I've created a gRFC describing the design and implementation plan for >>>> gRPC Retries. >>>> >>>> Take a look at the gRPC on Github >>>> <https://github.com/grpc/proposal/pull/12>. >>>> >>> >>> *CONFIDENTIALITY NOTICE: This email message, and any documents, files or >>> previous e-mail messages attached to it is for the sole use of the intended >>> recipient(s) and may contain confidential and privileged information. Any >>> unauthorized review, use, disclosure or distribution is prohibited. If you >>> are not the intended recipient, please contact the sender by reply email >>> and destroy all copies of the original message.* >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "grpc.io" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/grpc-io. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/grpc-io/62809dba-3349-4a60-9aa9-ccc044d27f53%40googlegroups.com >>> <https://groups.google.com/d/msgid/grpc-io/62809dba-3349-4a60-9aa9-ccc044d27f53%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups " > grpc.io" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/grpc-io. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/grpc-io/ce59f63d-1dee-46ff-a3eb-c813d15fc2dc%40googlegroups.com > <https://groups.google.com/d/msgid/grpc-io/ce59f63d-1dee-46ff-a3eb-c813d15fc2dc%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Mark D. Roth <[email protected]> Software Engineer Google, Inc. -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp5rPygFF%2BUK5zUpKohGBVEWwT02Obm0D6FmZPh0W7vSsQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
