Netflix has releases a gRPC compatible adaptive client and server throttle 
for Java.

https://medium.com/@NetflixTechBlog/performance-under-load-3e6fa9a60581
https://github.com/Netflix/concurrency-limits

On Thursday, March 2, 2017 at 9:28:29 PM UTC-8, Carl Mastrangelo wrote:
>
> Use a Semaphore and call tryAcquire() it before calling request().   Share 
> the semaphone between all your Server handlers.  You can do this with a 
> ServerInterceptor.  If tryAcquire() times out, you can immediately fail the 
> request.   It is up to you how long you are willing to wait before dropping 
> load.  
>
> Note that calling request() causes a message to be parsed and delivered to 
> your StreamObserver or ServerCall.Listener, which ever API you are using.
>
> That said, at least internally, many services deal with the problem of 
> overload by using loadbalancing, and provisioning up serving capacity. 
>  Each team has a different idea about how to handle overload, so we give 
> you the controls to deal with it rather than make a choice that may not be 
> applicable to all users. 
>
> On Wednesday, March 1, 2017 at 1:42:25 PM UTC-8, Ryan Michela wrote:
>>
>> OK, then let's back up.
>>
>> As an example, my DBAs tell me I cannot have unlimited number of 
>> concurrent connections to the database. Let's call the number 20. I need to 
>> restrict the number of concurrent executions of my service to no more than 
>> 20. During periods of high load, I may receive requests faster than my 20 
>> concurrent executions can handle, causing incoming calls to back up waiting 
>> for available resources. When a backup occurs, I want to start turning 
>> requests away immediately if more than 10 requests are backed up. This way 
>> the caller can try another host rather than keep hammering the overloaded 
>> host.
>>
>> What is the correct way to implement the above throttle policy in gRPC? I 
>> need to implement a standardized throttling behavior for all my services in 
>> a way that doesn't require that every developer of every service to insert 
>> the same brittle boilerplate in their services.
>>
>> On Wednesday, March 1, 2017 at 11:35:27 AM UTC-8, Carl Mastrangelo wrote:
>>>
>>> I think you are missing part of the picture.  If a client sends a 
>>> request to you, you have to either put it somewhere, or drop it.  The OS 
>>> will buffer the TCP bytes, but will stall all the other incoming bytes as 
>>> well.  Even if the userspace appliation, grpc, ignores those bytes, they 
>>> are still stuck in a buffer on your machine.  If this buffer grows too 
>>> much, the OS is going to start dropping packets, and make the connection 
>>> speed plummet.  The client will see this as deadline exceeded errors.  The 
>>> server will see this as head-of-line blocking.  This is a bad way to handle 
>>> overload.
>>>
>>> If you accept the data from the OS, into your userspace program, you 
>>> have control over it.  You can prioritize it over other requests. 
>>>  (Remember, HTTP/2 multiplexes many requests over a single connection.  If 
>>> you stall the connection, the RPCs cannot be handled).  Rather than letting 
>>> clients hang and repeatedly try to reconnect, and send more data than you 
>>> can buffer, you can directly fail the RPCs with an error indicating that 
>>> clients should back off.  It is cheap in terms of processing, and decreases 
>>> latency in overload scenarios.  This is a good way to handle overload.  
>>>
>>> Fixing the number of threads in the Server is tantamount to the bad way. 
>>>  The callbacks from the network thread (Netty in this case) will show up on 
>>> the queue of the executor.  The data will build up there until you OOM.  If 
>>> you limit the size of that queue and drop overload, callbacks will not 
>>> fire, and your application threads will hang indefinitely.  If you don't 
>>> drop overload, the network threads will hang trying to get a spot in the 
>>> executor queue, and will stop reading from the network.  You are back to 
>>> the first, bad solution.  
>>>
>>> Limiting the concurrency on an HTTP/2 basis only works per connection. 
>>>  It doesn't stop processing of existing RPCs, and it doesn't stop new RPCs 
>>> from coming in.  You will OOM.  
>>>
>>> The only other acceptable way to push back on clients is with Flow 
>>> control signals.  You can not call request() on the server call, which gRPC 
>>> will convert to HTTP/2 window updates, and you can check isReady() to see 
>>> if sending would block.  
>>>
>>> I think your problem is solved in ways other than the one you think you 
>>> need.  You are assuming too much about gRPC works.  If you are afraid of 
>>> malicious users, add auth.
>>>
>>> On Monday, February 27, 2017 at 8:38:39 PM UTC-8, Ryan Michela wrote:
>>>>
>>>> What should happen if a request comes in and the server cannot handle 
>>>>> it?  Fail the request immediately?  Queue it?  Drop the connection?  
>>>>> Queue 
>>>>> with dropping if overloaded? 
>>>>
>>>>
>>>> By adding my own Executor to the server (ServerBuilder.executor()) I 
>>>> can control much of this. By fixing the max number of threads I can 
>>>> control 
>>>> the maximum concurrency. By setting the Executor's queue depth, I can 
>>>> control how many backed up requests I'll accept before turning requesters 
>>>> away. What I can't do is control this on a service-by-service basis. I can 
>>>> only set an Executor for the Server. I'd hate to have to allocate full 
>>>> Server and port for each unique throttling policy.
>>>>
>>>> What I'd really like is for the Task created for each request to 
>>>> include some kind of metadata about the request so that I can write my own 
>>>> intelligent Executor that enforces throttles and backlog queues on a 
>>>> service-by-service or even operation-by-operation basis.
>>>>
>>>> On Monday, February 27, 2017 at 12:53:58 PM UTC-8, Carl Mastrangelo 
>>>> wrote:
>>>>>
>>>>> What should happen if a request comes in and the server cannot handle 
>>>>> it?  Fail the request immediately?  Queue it?  Drop the connection?  
>>>>> Queue 
>>>>> with dropping if overloaded?  You can do most of these from your 
>>>>> application without getting gRPC involved.  If you don't want to even 
>>>>> parse 
>>>>> the request, you can disable automatic inbound flow control, and simply 
>>>>> not 
>>>>> call request.  The data will queue up until the flow control window is 
>>>>> empty and the client will stop sending.  
>>>>>
>>>>> On Saturday, February 25, 2017 at 11:17:30 PM UTC-8, Ryan Michela 
>>>>> wrote:
>>>>>>
>>>>>> I'd like to implement concurrency throttling for my gRPC-java 
>>>>>> services so I can limit the number of concurrent executions of my 
>>>>>> service 
>>>>>> and put a reasonable cap on the queue for waiting work. One way I've 
>>>>>> found 
>>>>>> to do this is to use a custom Executor that limits the number of 
>>>>>> concurrent 
>>>>>> tasks and task queue depth. The downside of this approach is that it 
>>>>>> applies to all services hosted in a server. I've studied the code and 
>>>>>> there 
>>>>>> does not appear to be a a way for the server Executor to know which 
>>>>>> service 
>>>>>> and operation is being requested.
>>>>>>
>>>>>> What is the correct way to implement different throttle policies for 
>>>>>> individual service running in a Server? Do I really have to create a 
>>>>>> unique 
>>>>>> Server instance (with associated port) for every distinct throttle 
>>>>>> policy?
>>>>>>
>>>>>> Finally, would a PR to allow for per-service Executors be accepted?
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/56722dfc-9ad5-482d-b38e-b1a48e538872%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to