I encountered a weird behavior in gRPC.
*The symptom* - an active RPC stream is signaled as cancelled on server
side (happens from time to time, I couldn't find any correspondence with
other events in the environment) although the client is active and the
stream shouldn't be closed.
It happens for streams initialized as response streams in RPC calls from
both C++ and NodeJS clients. *It happened on gRPC 1.3.6 and still happens
on gRPC v1.6.0*.
The problem does not reproduced easily - the system is executed under heavy
load for many hours until this happens.
In my code, I have 2 main types of streams:
1. Control stream (C++→C#) - the client initiates an RPC call to the
server, which keeps the RPC's response stream opened.
Those streams are used as *control channels* with the *C++ clients* and
are kept open to allow server-to-client requests. When they are closed,
both client and server clean up all data related to the connection. So, the
control stream is critical to the session.
The server registers on call cancellation notification:
ServerCallContext context; // Received from RPC call as a parameter
// ...
context.CancellationToken.Register(() => System.Threading.ThreadPool.
QueueUserWorkItem(async obj => { handle_disconnection(...); }));
The total number of opened control streams (AKA number of connected C++
clients) is ~1200.
2. Command stream (NodeJS→C#) - There are many many other streams for
server-to-client command response communication, which are kept opened in
parallel by the server with *NodeJS clients*. The total number of opened
streams is 20K-30K.
The problem is noticeable when the control streams get disconnected.
*Further investigation of the client (C++) and server (C#) logs of control
stream disconnection, revealed to following*:
1. For some reason, the server's cancellation token (the one registered
above) is signaled - and the server does its cleanup
(`handle_disconnection` which also closes many command streams
intentionally). *According to the client, the connection should have
remained opened.*
2. After some time, the client realizes the connection was closed
unexpectedly and does its cleanup - throwing the error I posted here
<https://github.com/grpc/grpc/issues/12425#issuecomment-329958701>
(NodeJS in that case). *The clients disconnects itself only after the
server disconnects the connection and control stream.*
Another note - I set the servers' RequestCallTokensPerCompletionQueue value
for both C++/NodeJS client interfaces, to 32768 (32K) per completion queue.
I have 2 server interfaces (for node clients and C++ clients, which have
different API), and 4 completion queues (for 8 cores machine). I don't
really know if the 4 completion queues are global, or per-server.
*Do you think it might cause those streams to be closed under heavy load*?
In any case, my suspicious is on the C# server behavior - the
CancellationToken is signaled for no apparent reason.
I *didn't* rule out network instability yet - although both clients and
server are located on the same ESX server with 10-gig virtual adapters
between them, so this is quite a long-shot.
Do you have any idea how to solve this?
Thanks!
--
You received this message because you are subscribed to the Google Groups
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit
https://groups.google.com/d/msgid/grpc-io/aae6feda-a932-4de5-8519-22c928a36a31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.