Hi,

Thanks for the clear description of the architecture, and problem statement.

Issue 1: Why there isn't any RESOURCE_EXHAUSTED?

Based on the description "Requests on our gRPC microservices can take a 
while to complete. Average is 8s, up to 25s in the 99th %ile.", the 
handling logic of the RPC might be CPU-intensive? The concurrency in 
AsyncIO is achieved by the coroutines (async gens) yield the control back 
to the event loop, when it's doing IO or waiting on another resource. If 
one coroutine (in this case, the RPC method handler) consumes 8s of CPU 
continuously, it could starve other coroutines. This means even if there is 
incoming request, the AsyncIO stack won't have cycles to read them from 
kernel and reject with RESOURCE_EXHAUSTED.

If you hope to increase the parallelism of CPU-intense tasks and still use 
AsyncIO, I would recommend to use the AsyncIO Executors: 
https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools

Issue 2: Uneven RPC distribution.

This is controlled by the loadbalancer and sidecars.

---

Question 1: How can I get the desired effect here?

For AsyncIO servers, you can use the AsyncIO Executors as stated above.

For service meshes, there is usually several timeout options. In Envoy, 
there is HttpConnectionManager.request_timeout: 
https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto.
 
It disarms the timer when the first byte of the response header is 
received. So, we can change the RPC service to streaming, then ask the 
backend to send initial metadata when it decided to handle this request. In 
this way, we can set the request_timeout to a relatively short duration, 
then when a bad assignment occur it can recover quicker.

Question 2: Is there some way to ensure that at least the earlier requests 
don't get starved by the new requests?

In AsyncIO, there can be only one coroutine takes control at a time. See 
AsyncIO Executors.

Question 3: Any other advice on how to fix this issue?

See above.

Cheers,
Lidi Zheng


On Saturday, March 26, 2022 at 1:30:02 AM UTC-7 [email protected] wrote:

> Hello,
>
> *SetUp:*
> We have gRPC pods running in a k8s cluster. The service mesh we use is 
> linkerd. Our gRPC microservices are written in python (asyncio as the 
> concurrency mechanism), with the exception of the entry-point. That 
> microservice is written in golang (using gin framework). We have an AWS API 
> GW that talks to an NLB in front of the golang service. The golang service 
> communicates to the backend via nodeport services.
>
> *Problem:*
> Requests on our gRPC microservices can take a while to complete. Average 
> is 8s, up to 25s in the 99th %ile. In order to handle the load from 
> clients, we've horizontally scaled, and spawned many pods to handle 
> concurrent requests. When we send multiple requests to the system, even 
> sequentially, we sometimes notice that requests go to the same pod as an 
> ongoing request. What can happen is that this new request ends up getting 
> "queued" in the server-side (not fully "queued", some progress gets made 
> when context switches happen). The issue with queueing like this is that:
> 1. The earlier requests can start getting starved, and eventually timeout 
> (we have a hard 30s cap).
> 2. The newer requests may also not get handled on time, and as a result 
> get starved.
> The symptom we're noticing is 504s which are expected from our hard 30s 
> cap.
>
> What's strange is that we have other pods available, but for some reason 
> the loadbalancer isn't routing it to those pods smartly. It's possible that 
> linkerd's routing doesn't work well for our situation (we need to look into 
> this further, however that will require a big overhaul to our system).
>
> One thing I wanted to try doing is to stop this queuing up of requests. I 
> want the service to immediately reject the request if one is already in 
> progress, and have the client-retry. The client-retry will hopefully hit a 
> different pod (this is something I'm trying to test as a part of this). In 
> order to do this, I set the "maximum_concurrent_rpcs" to 1. When i sent 
> multiple requests in parallel to the system, I didn't see 
> any RESOURCE_EXHAUSTED exceptions.
>
> I then saw online that it may be possible for requests to get queued up on 
> the client-side as a default behavior in http/2. So I tried:
>
> channel = grpc.insecure_channel(
> "<some address>",
> options=[("grpc.max_concurrent_streams", 1)]
> )
>
> However, I'm not seeing any change here either. (Note, even after I get 
> this client working, eventually i'll need to make this run in golang. Any 
> help there would be appreciated as well).
>
> Questions:
> 1. How can I get the desired effect here?
> 2. Is there some way to ensure that at least the earlier requests don't 
> get starved by the new requests?
> 3. Any other advice on how to fix this issue? I'm grasping at straws here.
>
> Thank you!
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/7b3b11e9-b518-40dc-8ff7-e52e3ffdb54en%40googlegroups.com.

Reply via email to