Hello,

*SetUp:*
We have gRPC pods running in a k8s cluster. The service mesh we use is 
linkerd. Our gRPC microservices are written in python (asyncio as the 
concurrency mechanism), with the exception of the entry-point. That 
microservice is written in golang (using gin framework). We have an AWS API 
GW that talks to an NLB in front of the golang service. The golang service 
communicates to the backend via nodeport services.

*Problem:*
Requests on our gRPC microservices can take a while to complete. Average is 
8s, up to 25s in the 99th %ile. In order to handle the load from clients, 
we've horizontally scaled, and spawned many pods to handle concurrent 
requests. When we send multiple requests to the system, even sequentially, 
we sometimes notice that requests go to the same pod as an ongoing request. 
What can happen is that this new request ends up getting "queued" in the 
server-side (not fully "queued", some progress gets made when context 
switches happen). The issue with queueing like this is that:
1. The earlier requests can start getting starved, and eventually timeout 
(we have a hard 30s cap).
2. The newer requests may also not get handled on time, and as a result get 
starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.

What's strange is that we have other pods available, but for some reason 
the loadbalancer isn't routing it to those pods smartly. It's possible that 
linkerd's routing doesn't work well for our situation (we need to look into 
this further, however that will require a big overhaul to our system).

One thing I wanted to try doing is to stop this queuing up of requests. I 
want the service to immediately reject the request if one is already in 
progress, and have the client-retry. The client-retry will hopefully hit a 
different pod (this is something I'm trying to test as a part of this). In 
order to do this, I set the "maximum_concurrent_rpcs" to 1. When i sent 
multiple requests in parallel to the system, I didn't see 
any RESOURCE_EXHAUSTED exceptions.

I then saw online that it may be possible for requests to get queued up on 
the client-side as a default behavior in http/2. So I tried:

channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)

However, I'm not seeing any change here either. (Note, even after I get 
this client working, eventually i'll need to make this run in golang. Any 
help there would be appreciated as well).

Questions:
1. How can I get the desired effect here?
2. Is there some way to ensure that at least the earlier requests don't get 
starved by the new requests?
3. Any other advice on how to fix this issue? I'm grasping at straws here.

Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/6919e0ac-fea0-43f7-9aa8-896182e45f22n%40googlegroups.com.

Reply via email to