Hello,
*SetUp:*
We have gRPC pods running in a k8s cluster. The service mesh we use is
linkerd. Our gRPC microservices are written in python (asyncio as the
concurrency mechanism), with the exception of the entry-point. That
microservice is written in golang (using gin framework). We have an AWS API
GW that talks to an NLB in front of the golang service. The golang service
communicates to the backend via nodeport services.
*Problem:*
Requests on our gRPC microservices can take a while to complete. Average is
8s, up to 25s in the 99th %ile. In order to handle the load from clients,
we've horizontally scaled, and spawned many pods to handle concurrent
requests. When we send multiple requests to the system, even sequentially,
we sometimes notice that requests go to the same pod as an ongoing request.
What can happen is that this new request ends up getting "queued" in the
server-side (not fully "queued", some progress gets made when context
switches happen). The issue with queueing like this is that:
1. The earlier requests can start getting starved, and eventually timeout
(we have a hard 30s cap).
2. The newer requests may also not get handled on time, and as a result get
starved.
The symptom we're noticing is 504s which are expected from our hard 30s cap.
What's strange is that we have other pods available, but for some reason
the loadbalancer isn't routing it to those pods smartly. It's possible that
linkerd's routing doesn't work well for our situation (we need to look into
this further, however that will require a big overhaul to our system).
One thing I wanted to try doing is to stop this queuing up of requests. I
want the service to immediately reject the request if one is already in
progress, and have the client-retry. The client-retry will hopefully hit a
different pod (this is something I'm trying to test as a part of this). In
order to do this, I set the "maximum_concurrent_rpcs" to 1. When i sent
multiple requests in parallel to the system, I didn't see
any RESOURCE_EXHAUSTED exceptions.
I then saw online that it may be possible for requests to get queued up on
the client-side as a default behavior in http/2. So I tried:
channel = grpc.insecure_channel(
"<some address>",
options=[("grpc.max_concurrent_streams", 1)]
)
However, I'm not seeing any change here either. (Note, even after I get
this client working, eventually i'll need to make this run in golang. Any
help there would be appreciated as well).
Questions:
1. How can I get the desired effect here?
2. Is there some way to ensure that at least the earlier requests don't get
starved by the new requests?
3. Any other advice on how to fix this issue? I'm grasping at straws here.
Thank you!
--
You received this message because you are subscribed to the Google Groups
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/grpc-io/6919e0ac-fea0-43f7-9aa8-896182e45f22n%40googlegroups.com.