AutomationDev85 opened a new pull request, #60254: URL: https://github.com/apache/airflow/pull/60254
# Overview We observed intermittent hangs in Kubernetes API communication from KubernetesPodOperator, with calls stalling for over a day until tasks were manually stopped. Investigation showed the Kubernetes Python client wasn’t using a client-side timeout, so stalled connections could block indefinitely. To improve robustness, we add a client-side timeout to API calls so they raise a clear exception instead of leaving tasks hanging. This does not fix the underlying cluster/API issue, but it makes failures detectable and recoverable. We chose a 60-second timeout: long enough to tolerate load, short enough to prevent indefinite hangs. Timeouts are applied per call because there’s no clean, consistent way to set this at client creation across sync/async and watch/exec paths. # Change Summary * Set a 60-second client-side timeout for Kubernetes API requests. * Apply the timeout to individual API calls to ensure stalled calls fail fast. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
