Hi,

I opened an issue (https://github.com/grpc/grpc/issues/22698) but didn't 
get response so asking here. 


What version of gRPC and what language are you using?

v1.19.1, C++
What operating system (Linux, Windows,...) and version?

Linux
What runtime / compiler are you using (e.g. python version or version of 
gcc)What did you do?

We have synchronous client and synchronous server applications. In our 
case, client streams data and server listens most of the time. Client 
closes connection after 1 or 2 days based on configs.
In our client, on disconnect with server we check for channel status and 
reconnect to server.
We have seen issues with connection drops and client blocks forever. To 
deal with it, we enabled keep-alive pings following this document (
https://github.com/grpc/grpc/blob/master/doc/keepalive.md). The parameters 
we used for both client and server are:

GRPC_ARG_KEEPALIVE_TIME_MS = 10000
GRPC_ARG_KEEPALIVE_TIMEOUT_MS = 2000
GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS = 1
GRPC_ARG_HTTP2_MIN_SENT_PING_INTERVAL_WITHOUT_DATA_MS = 10000
GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS = 5000
GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA = 0

(I don't have any other GRPC variables set other than these)


What did you expect to see?

We expect to see client and server to send keepalive packets periodically 
and only disconnect if the connection between them or other end is down.
What did you see instead?

I see periodically (10 mins) the connection at client is dropped and our 
retry logic connects again.

At Client:

machine-host:/home/abc# GRPC_VERBOSITY=debug ./client
D0417 02:26:19.773717438   21870 ev_posix.cc:170]            Using polling 
engine: epollex
D0417 02:26:19.774037411   21870 dns_resolver_ares.cc:488]   Using ares dns 
resolver
I0417 02:26:19.775079731   21870 socket_utils_common_posix.cc:349] Disabling 
AF_INET6 sockets because ::1 is not available.
I0417 02:26:19.776441917   21870 subchannel.cc:1025]         New connected 
subchannel at 0x3fff74020830 for subchannel 0x3fff740210a0
E0417 02:36:21.775425004   21898 chttp2_transport.cc:2765]   ipv4:3.1.1.2:9011: 
Keepalive watchdog fired. Closing transport.
I0417 02:36:39.774677222   21898 subchannel.cc:1025]         New connected 
subchannel at 0x3fff58001980 for subchannel 0x3fff740210a0
E0417 02:46:41.774407146   21898 chttp2_transport.cc:2765]   ipv4:3.1.1.2:9011: 
Keepalive watchdog fired. Closing transport.
I0417 02:46:59.774638182   21898 subchannel.cc:1025]         New connected 
subchannel at 0x3fff50001170 for subchannel 0x3fff740210a0
E0417 02:57:01.774407382   21898 chttp2_transport.cc:2765]   ipv4:3.1.1.2:9011: 
Keepalive watchdog fired. Closing transport.
I0417 02:57:19.774682903   21898 subchannel.cc:1025]         New connected 
subchannel at 0x3fff58002450 for subchannel 0x3fff740210a0

At Server:

machine-host:/home/abc# GRPC_VERBOSITY=debug ./grpc_listener 3.1.1.2 9011
D0417 02:26:11.047268367    8603 ev_posix.cc:170]            Using polling 
engine: epollex
D0417 02:26:11.047769221    8603 dns_resolver_ares.cc:488]   Using ares dns 
resolver
I0417 02:26:11.047961621    8603 server_builder.cc:281]      Synchronous 
server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
I0417 02:26:11.048415621    8603 socket_utils_common_posix.cc:349] Disabling 
AF_INET6 sockets because ::1 is not available.
<no other grpc logs printed>

Whenever this reconnect happens we see increment in eventfd and eventpoll 
inodes of client application (2 new inodes created). As the connection drop 
and reconnect happens for every 10 mins, we see 2 new inodes created by 
client. Due to this, once we hit the max (1024) file descriptors our 
program is crashing. Similar to this issue: #19767 
<https://github.com/grpc/grpc/issues/19767> [same stack trace seen]

I have few questions.

   1. Do HTTP keep-alive pings exchange happen in synchronous server and 
   synchronous client environment? I am asking this because, I see one 
   suggestion (by GRPC contributor) to move to async client and server here: 
   #19711 <https://github.com/grpc/grpc/issues/19711>?
   2. Is the issue (#19767 <https://github.com/grpc/grpc/issues/19767>) 
   fixes in later versions. If yes, may I know which version it is fixed?
   3. May I know the reason for the connection break every 10 minutes, Am I 
   setting any parameters wrong?

Anything else we should know about your project / environment?

No


Continuation:


To experiment further to see keep alive packets in tcpdump, I have set 
following keep alive params at client end and nothing at client end:

GRPC_ARG_KEEPALIVE_TIME_MS = 10000
GRPC_ARG_KEEPALIVE_TIMEOUT_MS = 2000
GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS = 1


I expect here that keep-alive packets are exchanged every 10 secs but I 
never saw any packet exchange b/w client and server except data and after 5 
minutes or so, I see some packets flow (length 17, 60 and 13 shown below) 
and then FINISH packet is sent from client to server. I am not sure what 
the other packets that flown between client and server. Are they http2 ping 
packets?


3.1.1.1.43788 -> client

3.1.1.2.9011 -> server


23:13:36.774898 IP 3.1.1.1.43788 > 3.1.1.2.9011: Flags [P.], seq 943:1213, 
ack 90, win 115, options [nop,nop,TS val 2434255 ecr 2409302], length 270 
----------------> This is payload data from client to server.

23:13:36.774960 IP 3.1.1.2.9011 > 3.1.1.1.43788: Flags [.], ack 1213, win 
130, options [nop,nop,TS val 2434294 ecr 2434255], length 0

23:13:41.369246 IP 3.1.1.1.43788 > 3.1.1.2.9011: Flags [P.], seq 1213:1230, 
ack 90, win 115, options [nop,nop,TS val 2435403 ecr 2434294], length 17

23:13:41.369308 IP 3.1.1.2.9011 > 3.1.1.1.43788: Flags [.], ack 1230, win 
130, options [nop,nop,TS val 2435443 ecr 2435403], length 0

23:13:41.369730 IP 3.1.1.2.9011 > 3.1.1.1.43788: Flags [P.], seq 90:150, 
ack 1230, win 130, options [nop,nop,TS val 2435443 ecr 2435403], length 60

23:13:41.370142 IP 3.1.1.1.43788 > 3.1.1.2.9011: Flags [.], ack 150, win 
115, options [nop,nop,TS val 2435403 ecr 2435443], length 0

23:13:43.369560 IP 3.1.1.1.43788 > 3.1.1.2.9011: Flags [P.], seq 1230:1243, 
ack 150, win 115, options [nop,nop,TS val 2435903 ecr 2435443], length 13

23:13:43.369620 IP 3.1.1.1.43788 > 3.1.1.2.9011: Flags [F.], seq 1243, ack 
150, win 115, options [nop,nop,TS val 2435903 ecr 2435443], length 0 
----------------> 
This is finish from client to server.

23:13:43.369929 IP 3.1.1.2.9011 > 3.1.1.1.43788: Flags [F.], seq 150, ack 
1244, win 130, options [nop,nop,TS val 2435943 ecr 2435903], length 0

23:13:43.370380 IP 3.1.1.1.43788 > 3.1.1.2.9011: Flags [.], ack 151, win 
115, options [nop,nop,TS val 2435904 ecr 2435943], length 0


After this, reconnect happens and those packets are seen. Same sequence 
happens for every 5 minutes.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/ac78f758-47ef-48ea-b2e7-a162012e32cc%40googlegroups.com.

Reply via email to