[grpc-io] Re: gRPC streams unexpected disconnections

'Jan Tattermusch' via grpc.io Fri, 06 Oct 2017 00:38:04 -0700

The "serverCallContext.CancellationToken" is triggered any time the 
server-side call terminates prematurely with an error (it doesn't 
necessarily mean there was a cancellation, it happens anytime there's some 
error). Based on what you're saying (it takes hours of heavy load to 
reproduce), it might well just be that this is a bug in C# or C-core stack, 
that rarely gets triggered - but it's hard to tell with certainty unless 
you're able to come up with some evidence (like traces showing what exactly 
went wrong). 
Can you reproduce this on a single machine (would rule out network 
problems)? Can you reproduce when there's much lower number of concurrently 
open streams? Can you reproduce when only using C++ clients a servers (that 
would point to a potential problem in C-core)?
Have you checked values of relevant channel arguments? (e.g. 
https://github.com/grpc/grpc/blob/master/include/grpc/impl/codegen/grpc_types.h#L146)


On Monday, September 18, 2017 at 3:25:18 PM UTC+2, Amit Waisel wrote:
>
> I encountered a weird behavior in gRPC.
>
>
> *The symptom* - an active RPC stream is signaled as cancelled on server 
> side (happens from time to time, I couldn't find any correspondence with 
> other events in the environment) although the client is active and the 
> stream shouldn't be closed.
>
> It happens for streams initialized as response streams in RPC calls from 
> both C++ and NodeJS clients. *It happened on gRPC 1.3.6 and still happens 
> on gRPC v1.6.0*.
>
> The problem does not reproduced easily - the system is executed under 
> heavy load for many hours until this happens.
>
>
> In my code, I have 2 main types of streams:
>
>
>    1. Control stream (C++→C#) - the client initiates an RPC call to the 
>    server, which keeps the RPC's response stream opened.
>    Those streams are used as *control channels* with the *C++ clients* and 
>    are kept open to allow server-to-client requests. When they are closed, 
>    both client and server clean up all data related to the connection. So, 
> the 
>    control stream is critical to the session.
>    The server registers on call cancellation notification:
>     ServerCallContext context; // Received from RPC call as a parameter
>     // ...
>     context.CancellationToken.Register(() => System.Threading.ThreadPool.
>    QueueUserWorkItem(async obj => { handle_disconnection(...); }));
>    
>    The total number of opened control streams (AKA number of connected 
>    C++ clients) is ~1200. 
>    2. Command stream (NodeJS→C#) - There are many many other streams for 
>    server-to-client command response communication, which are kept opened in 
>    parallel by the server with *NodeJS clients*. The total number of 
>    opened streams is 20K-30K. 
>
> The problem is noticeable when the control streams get disconnected.
>
> *Further investigation of the client (C++) and server (C#) logs of control 
> stream disconnection, revealed to following*:
>
>
>    1. For some reason, the server's cancellation token (the one 
>    registered above) is signaled - and the server does its cleanup 
>    (`handle_disconnection` which also closes many command streams 
>    intentionally). *According to the client, the connection should have 
>    remained opened.*
>    2. After some time, the client realizes the connection was closed 
>    unexpectedly and does its cleanup - throwing the error I posted here 
>    <https://github.com/grpc/grpc/issues/12425#issuecomment-329958701> 
>    (NodeJS in that case). *The clients disconnects itself only after the 
>    server disconnects the connection and control stream.*
>
> Another note - I set the servers' RequestCallTokensPerCompletionQueue 
> value for both C++/NodeJS client interfaces, to 32768 (32K) per completion 
> queue.
>
> I have 2 server interfaces (for node clients and C++ clients, which have 
> different API), and 4 completion queues (for 8 cores machine). I don't 
> really know if the 4 completion queues are global, or per-server.
>
> *Do you think it might cause those streams to be closed under heavy load*?
>
>  
>
> In any case, my suspicious is on the C# server behavior - the 
> CancellationToken is signaled for no apparent reason.
>
> I *didn't* rule out network instability yet - although both clients and 
> server are located on the same ESX server with 10-gig virtual adapters 
> between them, so this is quite a long-shot.
>
>  
>
> Do you have any idea how to solve this?
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/8843c08c-b15c-4b2c-833c-9912d21da0e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[grpc-io] Re: gRPC streams unexpected disconnections

Reply via email to