scwhittle commented on code in PR #36528:
URL: https://github.com/apache/beam/pull/36528#discussion_r2491978702


##########
sdks/python/apache_beam/runners/worker/channel_factory.py:
##########
@@ -23,8 +23,24 @@
 
 class GRPCChannelFactory(grpc.StreamStreamClientInterceptor):
   DEFAULT_OPTIONS = [
-      ("grpc.keepalive_time_ms", 20000),
-      ("grpc.keepalive_timeout_ms", 300000),
+      # Default: 30000ms (30s), increased to 180s to reduce ping frequency
+      ("grpc.keepalive_time_ms", 180000),
+      # Default: 5000ms (5s), increased to 10 minutes for stability
+      ("grpc.keepalive_timeout_ms", 600000),

Review Comment:
   Do we have any periodic messages sent from SDK to runner that would 
otherwise detect a dead channel?
   What if Dataflow runner crashes and restarts while the SDK had no active 
bundles.  It seems like if we are not sending messages we might take up this 
this timeout to detect that the runner process we were talking to died and that 
we need to create a new channel.
   
   If so this seems bad for streaming pipelines which want lower latency.
   
   If this is just for testing, what about making this timeout configurable via 
an option?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to