I think you are missing part of the picture. If a client sends a request to you, you have to either put it somewhere, or drop it. The OS will buffer the TCP bytes, but will stall all the other incoming bytes as well. Even if the userspace appliation, grpc, ignores those bytes, they are still stuck in a buffer on your machine. If this buffer grows too much, the OS is going to start dropping packets, and make the connection speed plummet. The client will see this as deadline exceeded errors. The server will see this as head-of-line blocking. This is a bad way to handle overload.
If you accept the data from the OS, into your userspace program, you have control over it. You can prioritize it over other requests. (Remember, HTTP/2 multiplexes many requests over a single connection. If you stall the connection, the RPCs cannot be handled). Rather than letting clients hang and repeatedly try to reconnect, and send more data than you can buffer, you can directly fail the RPCs with an error indicating that clients should back off. It is cheap in terms of processing, and decreases latency in overload scenarios. This is a good way to handle overload. Fixing the number of threads in the Server is tantamount to the bad way. The callbacks from the network thread (Netty in this case) will show up on the queue of the executor. The data will build up there until you OOM. If you limit the size of that queue and drop overload, callbacks will not fire, and your application threads will hang indefinitely. If you don't drop overload, the network threads will hang trying to get a spot in the executor queue, and will stop reading from the network. You are back to the first, bad solution. Limiting the concurrency on an HTTP/2 basis only works per connection. It doesn't stop processing of existing RPCs, and it doesn't stop new RPCs from coming in. You will OOM. The only other acceptable way to push back on clients is with Flow control signals. You can not call request() on the server call, which gRPC will convert to HTTP/2 window updates, and you can check isReady() to see if sending would block. I think your problem is solved in ways other than the one you think you need. You are assuming too much about gRPC works. If you are afraid of malicious users, add auth. On Monday, February 27, 2017 at 8:38:39 PM UTC-8, Ryan Michela wrote: > > What should happen if a request comes in and the server cannot handle it? >> Fail the request immediately? Queue it? Drop the connection? Queue with >> dropping if overloaded? > > > By adding my own Executor to the server (ServerBuilder.executor()) I can > control much of this. By fixing the max number of threads I can control the > maximum concurrency. By setting the Executor's queue depth, I can control > how many backed up requests I'll accept before turning requesters away. > What I can't do is control this on a service-by-service basis. I can only > set an Executor for the Server. I'd hate to have to allocate full Server > and port for each unique throttling policy. > > What I'd really like is for the Task created for each request to include > some kind of metadata about the request so that I can write my own > intelligent Executor that enforces throttles and backlog queues on a > service-by-service or even operation-by-operation basis. > > On Monday, February 27, 2017 at 12:53:58 PM UTC-8, Carl Mastrangelo wrote: >> >> What should happen if a request comes in and the server cannot handle it? >> Fail the request immediately? Queue it? Drop the connection? Queue with >> dropping if overloaded? You can do most of these from your application >> without getting gRPC involved. If you don't want to even parse the >> request, you can disable automatic inbound flow control, and simply not >> call request. The data will queue up until the flow control window is >> empty and the client will stop sending. >> >> On Saturday, February 25, 2017 at 11:17:30 PM UTC-8, Ryan Michela wrote: >>> >>> I'd like to implement concurrency throttling for my gRPC-java services >>> so I can limit the number of concurrent executions of my service and put a >>> reasonable cap on the queue for waiting work. One way I've found to do this >>> is to use a custom Executor that limits the number of concurrent tasks and >>> task queue depth. The downside of this approach is that it applies to all >>> services hosted in a server. I've studied the code and there does not >>> appear to be a a way for the server Executor to know which service and >>> operation is being requested. >>> >>> What is the correct way to implement different throttle policies for >>> individual service running in a Server? Do I really have to create a unique >>> Server instance (with associated port) for every distinct throttle policy? >>> >>> Finally, would a PR to allow for per-service Executors be accepted? >>> >> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/46445f9e-a286-4d8e-81d9-10edbf56b5f6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
