Vamsi-klu opened a new pull request, #62307:
URL: https://github.com/apache/airflow/pull/62307

   ## Summary
   
   - Adds runtime validation in `KubernetesHook.get_conn()` to detect `aws eks 
get-token` exec auth in kubeconfig and check botocore version against the 
`~/.aws/cli/cache` race condition fix (botocore >= 1.40.2)
   - New connection extra `exec_auth_aws_cli_version_check_mode` with three 
modes: `warn` (default), `fail`, `ignore`
   - Botocore version is resolved via `aws --version` with `@lru_cache` to 
avoid repeated subprocess calls within a session
   
   ## Motivation
   
   Closes #60943 — Intermittent `403 Forbidden` failures on 
`KubernetesPodOperator` with CeleryExecutor when multiple tasks invoke `aws eks 
get-token` concurrently on the same worker. The root cause is a race condition 
in botocore < 1.40.2 creating `~/.aws/cli/cache`. This PR gives operators a 
clear signal at connection setup time instead of a misleading 403.
   
   ## Approach
   
   This implements **Approach 1** from #61936 — placing the guardrail directly 
in `KubernetesHook` since the vulnerability affects any kubeconfig using AWS 
exec auth, not just users going through `EksHook`.
   
   ### Key design decisions:
   - **`@lru_cache(maxsize=8)`** on `_get_aws_cli_botocore_version()` — 
subprocess is called once per binary, cached for the session lifetime
   - **Early exit at every step** — no cost when exec auth isn't AWS-based
   - **Kubeconfig parsing is defensive** — `yaml.YAMLError`, `OSError`, missing 
keys all result in a skip (debug log), never a crash
   
   ## Files changed
   
   | File | Change |
   |------|--------|
   | `hooks/kubernetes.py` | Helper functions + 
`_check_exec_auth_aws_cli_botocore_version()` + integration into all 4 
`get_conn()` paths |
   | `docs/connections/kubernetes.rst` | Document new connection extra |
   | `tests/.../test_kubernetes.py` | 10 new unit tests covering detection, 
version parsing, all 3 modes, fallback, and `get_conn` integration |
   
   ## Test plan
   
   - [x] All 10 new tests pass
   - [x] Full existing test suite (144 tests) passes with zero regressions
   - [ ] Manual validation with EKS kubeconfig using `aws eks get-token`
   
   cc @jscheffl @o-nikolas @jedcunningham @hussein-awala — would appreciate 
your feedback on this approach
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to