Igal Shilman created FLINK-18790:
------------------------------------
Summary: Set a connection timeout that is lower than the request
timeout for remote functions
Key: FLINK-18790
URL: https://issues.apache.org/jira/browse/FLINK-18790
Project: Flink
Issue Type: Improvement
Components: Stateful Functions
Reporter: Igal Shilman
Fix For: statefun-2.2.0
Currently for remote functions, the connection timeout is identical to the
whole request timeout. A problem with this happens when a remote function is
behind a NAT/load balancer/or in general behind anything that holds the port
open, even tho the remote function is not present or was relocated. In that
case the entire request budget would be spent on waiting for a connection.
This in particularly the case in Kubernetes where pods behind a service, were
ungracefully killed at once.
To fix that issue, I propose:
1) by default use 10% of the total request timeout for the connection timeout.
2) expose a configuration parameter explicitly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)