Netflix has releases a gRPC compatible adaptive client and server throttle for Java.
https://medium.com/@NetflixTechBlog/performance-under-load-3e6fa9a60581 https://github.com/Netflix/concurrency-limits On Thursday, March 2, 2017 at 9:28:29 PM UTC-8, Carl Mastrangelo wrote: > > Use a Semaphore and call tryAcquire() it before calling request(). Share > the semaphone between all your Server handlers. You can do this with a > ServerInterceptor. If tryAcquire() times out, you can immediately fail the > request. It is up to you how long you are willing to wait before dropping > load. > > Note that calling request() causes a message to be parsed and delivered to > your StreamObserver or ServerCall.Listener, which ever API you are using. > > That said, at least internally, many services deal with the problem of > overload by using loadbalancing, and provisioning up serving capacity. > Each team has a different idea about how to handle overload, so we give > you the controls to deal with it rather than make a choice that may not be > applicable to all users. > > On Wednesday, March 1, 2017 at 1:42:25 PM UTC-8, Ryan Michela wrote: >> >> OK, then let's back up. >> >> As an example, my DBAs tell me I cannot have unlimited number of >> concurrent connections to the database. Let's call the number 20. I need to >> restrict the number of concurrent executions of my service to no more than >> 20. During periods of high load, I may receive requests faster than my 20 >> concurrent executions can handle, causing incoming calls to back up waiting >> for available resources. When a backup occurs, I want to start turning >> requests away immediately if more than 10 requests are backed up. This way >> the caller can try another host rather than keep hammering the overloaded >> host. >> >> What is the correct way to implement the above throttle policy in gRPC? I >> need to implement a standardized throttling behavior for all my services in >> a way that doesn't require that every developer of every service to insert >> the same brittle boilerplate in their services. >> >> On Wednesday, March 1, 2017 at 11:35:27 AM UTC-8, Carl Mastrangelo wrote: >>> >>> I think you are missing part of the picture. If a client sends a >>> request to you, you have to either put it somewhere, or drop it. The OS >>> will buffer the TCP bytes, but will stall all the other incoming bytes as >>> well. Even if the userspace appliation, grpc, ignores those bytes, they >>> are still stuck in a buffer on your machine. If this buffer grows too >>> much, the OS is going to start dropping packets, and make the connection >>> speed plummet. The client will see this as deadline exceeded errors. The >>> server will see this as head-of-line blocking. This is a bad way to handle >>> overload. >>> >>> If you accept the data from the OS, into your userspace program, you >>> have control over it. You can prioritize it over other requests. >>> (Remember, HTTP/2 multiplexes many requests over a single connection. If >>> you stall the connection, the RPCs cannot be handled). Rather than letting >>> clients hang and repeatedly try to reconnect, and send more data than you >>> can buffer, you can directly fail the RPCs with an error indicating that >>> clients should back off. It is cheap in terms of processing, and decreases >>> latency in overload scenarios. This is a good way to handle overload. >>> >>> Fixing the number of threads in the Server is tantamount to the bad way. >>> The callbacks from the network thread (Netty in this case) will show up on >>> the queue of the executor. The data will build up there until you OOM. If >>> you limit the size of that queue and drop overload, callbacks will not >>> fire, and your application threads will hang indefinitely. If you don't >>> drop overload, the network threads will hang trying to get a spot in the >>> executor queue, and will stop reading from the network. You are back to >>> the first, bad solution. >>> >>> Limiting the concurrency on an HTTP/2 basis only works per connection. >>> It doesn't stop processing of existing RPCs, and it doesn't stop new RPCs >>> from coming in. You will OOM. >>> >>> The only other acceptable way to push back on clients is with Flow >>> control signals. You can not call request() on the server call, which gRPC >>> will convert to HTTP/2 window updates, and you can check isReady() to see >>> if sending would block. >>> >>> I think your problem is solved in ways other than the one you think you >>> need. You are assuming too much about gRPC works. If you are afraid of >>> malicious users, add auth. >>> >>> On Monday, February 27, 2017 at 8:38:39 PM UTC-8, Ryan Michela wrote: >>>> >>>> What should happen if a request comes in and the server cannot handle >>>>> it? Fail the request immediately? Queue it? Drop the connection? >>>>> Queue >>>>> with dropping if overloaded? >>>> >>>> >>>> By adding my own Executor to the server (ServerBuilder.executor()) I >>>> can control much of this. By fixing the max number of threads I can >>>> control >>>> the maximum concurrency. By setting the Executor's queue depth, I can >>>> control how many backed up requests I'll accept before turning requesters >>>> away. What I can't do is control this on a service-by-service basis. I can >>>> only set an Executor for the Server. I'd hate to have to allocate full >>>> Server and port for each unique throttling policy. >>>> >>>> What I'd really like is for the Task created for each request to >>>> include some kind of metadata about the request so that I can write my own >>>> intelligent Executor that enforces throttles and backlog queues on a >>>> service-by-service or even operation-by-operation basis. >>>> >>>> On Monday, February 27, 2017 at 12:53:58 PM UTC-8, Carl Mastrangelo >>>> wrote: >>>>> >>>>> What should happen if a request comes in and the server cannot handle >>>>> it? Fail the request immediately? Queue it? Drop the connection? >>>>> Queue >>>>> with dropping if overloaded? You can do most of these from your >>>>> application without getting gRPC involved. If you don't want to even >>>>> parse >>>>> the request, you can disable automatic inbound flow control, and simply >>>>> not >>>>> call request. The data will queue up until the flow control window is >>>>> empty and the client will stop sending. >>>>> >>>>> On Saturday, February 25, 2017 at 11:17:30 PM UTC-8, Ryan Michela >>>>> wrote: >>>>>> >>>>>> I'd like to implement concurrency throttling for my gRPC-java >>>>>> services so I can limit the number of concurrent executions of my >>>>>> service >>>>>> and put a reasonable cap on the queue for waiting work. One way I've >>>>>> found >>>>>> to do this is to use a custom Executor that limits the number of >>>>>> concurrent >>>>>> tasks and task queue depth. The downside of this approach is that it >>>>>> applies to all services hosted in a server. I've studied the code and >>>>>> there >>>>>> does not appear to be a a way for the server Executor to know which >>>>>> service >>>>>> and operation is being requested. >>>>>> >>>>>> What is the correct way to implement different throttle policies for >>>>>> individual service running in a Server? Do I really have to create a >>>>>> unique >>>>>> Server instance (with associated port) for every distinct throttle >>>>>> policy? >>>>>> >>>>>> Finally, would a PR to allow for per-service Executors be accepted? >>>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/56722dfc-9ad5-482d-b38e-b1a48e538872%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
