(+dgq)

I think this is actually a question to be addressed in the load-balancing
affinity design, which David is working on.  I suspect that the main thing
we need to do is to expose the request metadata that indicates that a
request is a retry to the LB policy, so that it can use that information to
make its decision.  Then it's up to the LB policy to notice that a request
is a retry and apply any necessary logic for that case.


On Sun, Feb 12, 2017 at 7:26 PM, <[email protected]> wrote:

> > We are not supporting explicit load balancing constraints for retries.
> The retry attempt or hedged RPC will be re-resolved through the
> load-balancer, so it's up to the service owner to ensure that this has a
> low-likelihood of issuing the request to the same backend.
>
> That seems fairly difficult for any service with request-dependent routing
> semantics. Lets use a DFS as an example: many DFSes maintain N replicas of
> a given file block. In the case where you send a hedged request for a
> block, your likelihood is 1/N of requerying the same DFS node which might
> well have a slow disk. At least for us using HDFS, N=3 most of the time; a
> therefore 33% chance of requerying the same node. Even assuming a smart
> load balancing service which intelligently removes poorly performing
> storage nodes from service, it still seems desirable to ensure hedged
> requests go to a different node. Not having a story for more informed load
> balancing seems like it makes a lot of use cases more difficult than they
> need to be.
>
> Regards,
> Michael
>
> On Sunday, February 12, 2017 at 7:24:59 PM UTC-7, Eric Gribkoff wrote:
>>
>> Hi Michael,
>>
>> Thanks for the feedback. Responses to your questions (and Josh's
>> follow-up question on retry backoff times) are inline below.
>>
>> On Sat, Feb 11, 2017 at 1:57 PM, 'Michael Rose' via grpc.io <
>> [email protected]> wrote:
>>
>>> A few questions:
>>>
>>> 1) Under this design, is it possible to add a load balancing constraints
>>> for retried/hedged requests? Especially during hedging, I'd like to be able
>>> to try a different server since the original server might be garbage
>>> collecting or have otherwise collected a queue of requests such that a
>>> retry/hedge to this server will not be very useful. Or, perhaps the key I'm
>>> looking up lives on a specific subset of storage servers and therefore
>>> should be balanced to that specific subset. While that's the domain of a LB
>>> policy, what information will hedging/retries provide to the LB policy?
>>>
>>>
>> We are not supporting explicit load balancing constraints for retries.
>> The retry attempt or hedged RPC will be re-resolved through the
>> load-balancer, so it's up to the service owner to ensure that this has a
>> low-likelihood of issuing the request to the same backend. This is part of
>> a decision to keep the retry design as simple as possible while satisfying
>> the majority of use cases. If your load-balancing policy has a high
>> likelihood of sending requests to the same server each time, hedging (and
>> to some extent retries) will be less useful regardless. There will be
>> metadata attached to the call indicating that it's a retry, but it won't
>> include information about which servers the previous requests went to.
>>
>>
>>
>>> 2) "Clients cannot override retry policy set by the service config." --
>>> is this intended for inside Google? How about gRPC users outside of Google
>>> which don't use the DNS mechanism to push configuration? It seems like
>>> having a client override for retry/hedging policy is pragmatic.
>>>
>>>
>> In general, we don't want to support client specification of retry
>> policies. The necessary information about what methods are safe to retry or
>> hedge, the potential for increased load, etc., are really decisions that
>> should be left to the service owner. The retry policy will definitely be a
>> part of the service config. While there are still some security-related
>> discussions about the exact delivery mechanism for the service config and
>> retry policies, I think your concern here  should be part of the service
>> config design discussion rather than something specific to retry support.
>>
>>
>>> 3) Retry backoff time -- if I'm reading it right, it will always retry
>>> in random(0, current_backoff) milliseconds. What's your feeling on this vs.
>>> a retry w/ configurable jitter parameter (e.x. linear 1000ms increase w/
>>> 10% jitter). Is it OK if there's no minimum backoff?
>>>
>>>
>> You are reading the backoff time correctly. There are a number of ways of
>> doing this, (see https://www.awsarchitectureblog.com/2015/03/backoff.html)
>> but choosing between random(0, current_backoff) is done intentionally and
>> should generally give the best results. We do not want a configurable
>> "jitter" parameter. Empirically, the retries should have more varied
>> backoff time, and we also do not want to let service owners specify very
>> low values for jitter (e.g., 1% or even 0), as this would cluster all
>> retries tightly together and further contribute to server overloading.
>>
>> Best,
>>
>> Eric Gribkoff
>>
>>
>> Regards,
>>> Michael
>>>
>>> On Friday, February 10, 2017 at 5:31:01 PM UTC-7, [email protected]
>>> wrote:
>>>>
>>>> I've created a gRFC describing the design and implementation plan for
>>>> gRPC Retries.
>>>>
>>>> Take a look at the gRPC on Github
>>>> <https://github.com/grpc/proposal/pull/12>.
>>>>
>>>
>>> *CONFIDENTIALITY NOTICE: This email message, and any documents, files or
>>> previous e-mail messages attached to it is for the sole use of the intended
>>> recipient(s) and may contain confidential and privileged information. Any
>>> unauthorized review, use, disclosure or distribution is prohibited. If you
>>> are not the intended recipient, please contact the sender by reply email
>>> and destroy all copies of the original message.*
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "grpc.io" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/grpc-io.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/grpc-io/62809dba-3349-4a60-9aa9-ccc044d27f53%40googlegroups.com
>>> <https://groups.google.com/d/msgid/grpc-io/62809dba-3349-4a60-9aa9-ccc044d27f53%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/grpc-io/ce59f63d-1dee-46ff-a3eb-c813d15fc2dc%40googlegroups.com
> <https://groups.google.com/d/msgid/grpc-io/ce59f63d-1dee-46ff-a3eb-c813d15fc2dc%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Mark D. Roth <[email protected]>
Software Engineer
Google, Inc.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAJgPXp5rPygFF%2BUK5zUpKohGBVEWwT02Obm0D6FmZPh0W7vSsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to