Use a Semaphore and call tryAcquire() it before calling request().   Share 
the semaphone between all your Server handlers.  You can do this with a 
ServerInterceptor.  If tryAcquire() times out, you can immediately fail the 
request.   It is up to you how long you are willing to wait before dropping 
load.  

Note that calling request() causes a message to be parsed and delivered to 
your StreamObserver or ServerCall.Listener, which ever API you are using.

That said, at least internally, many services deal with the problem of 
overload by using loadbalancing, and provisioning up serving capacity. 
 Each team has a different idea about how to handle overload, so we give 
you the controls to deal with it rather than make a choice that may not be 
applicable to all users. 

On Wednesday, March 1, 2017 at 1:42:25 PM UTC-8, Ryan Michela wrote:
>
> OK, then let's back up.
>
> As an example, my DBAs tell me I cannot have unlimited number of 
> concurrent connections to the database. Let's call the number 20. I need to 
> restrict the number of concurrent executions of my service to no more than 
> 20. During periods of high load, I may receive requests faster than my 20 
> concurrent executions can handle, causing incoming calls to back up waiting 
> for available resources. When a backup occurs, I want to start turning 
> requests away immediately if more than 10 requests are backed up. This way 
> the caller can try another host rather than keep hammering the overloaded 
> host.
>
> What is the correct way to implement the above throttle policy in gRPC? I 
> need to implement a standardized throttling behavior for all my services in 
> a way that doesn't require that every developer of every service to insert 
> the same brittle boilerplate in their services.
>
> On Wednesday, March 1, 2017 at 11:35:27 AM UTC-8, Carl Mastrangelo wrote:
>>
>> I think you are missing part of the picture.  If a client sends a request 
>> to you, you have to either put it somewhere, or drop it.  The OS will 
>> buffer the TCP bytes, but will stall all the other incoming bytes as well. 
>>  Even if the userspace appliation, grpc, ignores those bytes, they are 
>> still stuck in a buffer on your machine.  If this buffer grows too much, 
>> the OS is going to start dropping packets, and make the connection speed 
>> plummet.  The client will see this as deadline exceeded errors.  The server 
>> will see this as head-of-line blocking.  This is a bad way to handle 
>> overload.
>>
>> If you accept the data from the OS, into your userspace program, you have 
>> control over it.  You can prioritize it over other requests.  (Remember, 
>> HTTP/2 multiplexes many requests over a single connection.  If you stall 
>> the connection, the RPCs cannot be handled).  Rather than letting clients 
>> hang and repeatedly try to reconnect, and send more data than you can 
>> buffer, you can directly fail the RPCs with an error indicating that 
>> clients should back off.  It is cheap in terms of processing, and decreases 
>> latency in overload scenarios.  This is a good way to handle overload.  
>>
>> Fixing the number of threads in the Server is tantamount to the bad way. 
>>  The callbacks from the network thread (Netty in this case) will show up on 
>> the queue of the executor.  The data will build up there until you OOM.  If 
>> you limit the size of that queue and drop overload, callbacks will not 
>> fire, and your application threads will hang indefinitely.  If you don't 
>> drop overload, the network threads will hang trying to get a spot in the 
>> executor queue, and will stop reading from the network.  You are back to 
>> the first, bad solution.  
>>
>> Limiting the concurrency on an HTTP/2 basis only works per connection. 
>>  It doesn't stop processing of existing RPCs, and it doesn't stop new RPCs 
>> from coming in.  You will OOM.  
>>
>> The only other acceptable way to push back on clients is with Flow 
>> control signals.  You can not call request() on the server call, which gRPC 
>> will convert to HTTP/2 window updates, and you can check isReady() to see 
>> if sending would block.  
>>
>> I think your problem is solved in ways other than the one you think you 
>> need.  You are assuming too much about gRPC works.  If you are afraid of 
>> malicious users, add auth.
>>
>> On Monday, February 27, 2017 at 8:38:39 PM UTC-8, Ryan Michela wrote:
>>>
>>> What should happen if a request comes in and the server cannot handle 
>>>> it?  Fail the request immediately?  Queue it?  Drop the connection?  Queue 
>>>> with dropping if overloaded? 
>>>
>>>
>>> By adding my own Executor to the server (ServerBuilder.executor()) I can 
>>> control much of this. By fixing the max number of threads I can control the 
>>> maximum concurrency. By setting the Executor's queue depth, I can control 
>>> how many backed up requests I'll accept before turning requesters away. 
>>> What I can't do is control this on a service-by-service basis. I can only 
>>> set an Executor for the Server. I'd hate to have to allocate full Server 
>>> and port for each unique throttling policy.
>>>
>>> What I'd really like is for the Task created for each request to include 
>>> some kind of metadata about the request so that I can write my own 
>>> intelligent Executor that enforces throttles and backlog queues on a 
>>> service-by-service or even operation-by-operation basis.
>>>
>>> On Monday, February 27, 2017 at 12:53:58 PM UTC-8, Carl Mastrangelo 
>>> wrote:
>>>>
>>>> What should happen if a request comes in and the server cannot handle 
>>>> it?  Fail the request immediately?  Queue it?  Drop the connection?  Queue 
>>>> with dropping if overloaded?  You can do most of these from your 
>>>> application without getting gRPC involved.  If you don't want to even 
>>>> parse 
>>>> the request, you can disable automatic inbound flow control, and simply 
>>>> not 
>>>> call request.  The data will queue up until the flow control window is 
>>>> empty and the client will stop sending.  
>>>>
>>>> On Saturday, February 25, 2017 at 11:17:30 PM UTC-8, Ryan Michela wrote:
>>>>>
>>>>> I'd like to implement concurrency throttling for my gRPC-java services 
>>>>> so I can limit the number of concurrent executions of my service and put 
>>>>> a 
>>>>> reasonable cap on the queue for waiting work. One way I've found to do 
>>>>> this 
>>>>> is to use a custom Executor that limits the number of concurrent tasks 
>>>>> and 
>>>>> task queue depth. The downside of this approach is that it applies to all 
>>>>> services hosted in a server. I've studied the code and there does not 
>>>>> appear to be a a way for the server Executor to know which service and 
>>>>> operation is being requested.
>>>>>
>>>>> What is the correct way to implement different throttle policies for 
>>>>> individual service running in a Server? Do I really have to create a 
>>>>> unique 
>>>>> Server instance (with associated port) for every distinct throttle policy?
>>>>>
>>>>> Finally, would a PR to allow for per-service Executors be accepted?
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/1633612d-a923-4b3d-bb2d-1b87951bc0a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to