tvalentyn commented on code in PR #36528:
URL: https://github.com/apache/beam/pull/36528#discussion_r2482116758


##########
sdks/python/apache_beam/runners/worker/channel_factory.py:
##########
@@ -23,8 +23,24 @@
 
 class GRPCChannelFactory(grpc.StreamStreamClientInterceptor):
   DEFAULT_OPTIONS = [
-      ("grpc.keepalive_time_ms", 20000),
-      ("grpc.keepalive_timeout_ms", 300000),
+      # Default: 30000ms (30s), increased to 180s to reduce ping frequency
+      ("grpc.keepalive_time_ms", 180000),
+      # Default: 5000ms (5s), increased to 10 minutes for stability
+      ("grpc.keepalive_timeout_ms", 600000),

Review Comment:
   > Also note that because this value is now larger than 
grpc.keepalive_time_ms, it'll result in 3 outstanding keepalive pings,
   
   per [comment 
above](https://github.com/apache/beam/pull/36528/files#r2482014530), 
keepalive_time_ms is INTMAX.
   
   Increasing from 5 min to 10 min shouldn't be a problem - we don't expect 
dead channels, and if there was one, failing earlier likely won't help unless a 
system can somehow recover from a dead channel; yet, if the channel is not dead 
but a system is overloaded and a process is not responding, and increasing the 
TTL helps reduce flakiness, then I think it's fine to increase.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to