andrewhharmon opened a new issue, #61737:
URL: https://github.com/apache/airflow/issues/61737
### Apache Airflow Provider(s)
cncf-kubernetes
### Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==10.12.3 (regression introduced in
10.12.0)
Working in: apache-airflow-providers-cncf-kubernetes==10.11.0
### Apache Airflow version
3.0.0 (also affects 2.x with the affected provider version)
### Operating System
Debian/Ubuntu-based containers (Astronomer Runtime)
### Deployment
Astronomer
### Deployment details
Triggerer runs on a separate host from the worker. EKS cluster
authentication uses exec-based kubeconfig (`aws eks get-token`), where the exec
command must be re-invoked periodically to obtain fresh short-lived tokens.
### What happened
`KubernetesPodTrigger` fails with 401 Unauthorized after ~15 minutes when
using exec-based kubeconfig authentication (e.g., EKS clusters with `aws eks
get-token`).
**Root cause:** In version 10.12.0, a `_config_loaded` caching guard was
added to `AsyncKubernetesHook._load_config()`:
```python
async def _load_config(self):
"""Load Kubernetes configuration once per hook instance."""
if self._config_loaded: # <-- new in 10.12.x
return
# ... load config, execute exec plugin, get token ...
self._config_loaded = True
```
In previous versions (10.11.x and earlier), `_load_config()` ran on every
`get_conn()` call. This meant the exec plugin (e.g., `aws eks get-token`) was
re-invoked on each poll, always producing a fresh token.
With the `_config_loaded` guard, the exec plugin runs **once** for the
lifetime of the hook instance. Since `KubernetesPodTrigger.hook` is a
`@cached_property`, the hook (and therefore the stale token) persists for the
entire duration of the trigger. EKS STS tokens expire after ~15 minutes, so any
pod monitored longer than that gets 401 Unauthorized.
**Error output:**
```
kubernetes_asyncio.client.exceptions.ApiException: (401)
Reason: Unauthorized
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},
"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
```
**Stack trace (from triggerer):**
```
File "airflow/providers/cncf/kubernetes/triggers/pod.py", line 318, in
_get_pod
pod = await self.hook.get_pod(name=self.pod_name,
namespace=self.pod_namespace)
File "airflow/providers/cncf/kubernetes/hooks/kubernetes.py", line 948, in
get_pod
pod: V1Pod = await v1_api.read_namespaced_pod(
```
The `@tenacity.retry` on `_get_pod()` (3 attempts) and `@generic_api_retry`
on `get_pod()` do not help because every retry reuses the same cached hook with
the same expired token.
### What you think should happen instead
`_load_config()` should support exec-based auth that requires periodic token
refresh. The `_config_loaded` optimization is valid for static credentials
(bearer tokens, certificates, in-cluster service accounts) but breaks
exec-based credential plugins that produce short-lived tokens.
Possible approaches:
1. **Track the exec token's expiration and reload when needed.** When
`load_kube_config_from_dict()` processes an exec plugin, the response includes
an `expirationTimestamp`. The hook could store this and reset `_config_loaded`
when approaching expiry.
2. **Reset `_config_loaded` periodically.** A simpler approach — reset the
flag on a configurable interval (e.g., 10 minutes) so that exec plugins are
re-invoked before typical token lifetimes expire.
3. **Don't cache when config uses exec-based auth.** After loading the
config, check if the user auth uses an exec plugin. If so, skip setting
`_config_loaded = True` so it reloads on each `get_conn()` call (restoring the
pre-10.12.0 behavior for exec-based configs).
### How to reproduce
1. Configure a `KubernetesPodOperator` (or `EksPodOperator`) with
`deferrable=True` connecting to a cluster that uses exec-based kubeconfig auth
(e.g., EKS with `aws eks get-token`)
2. Use `apache-airflow-providers-cncf-kubernetes>=10.12.0`
3. Run a pod that takes longer than the exec token's lifetime (~15 minutes
for EKS)
4. Observe 401 Unauthorized after the token expires
To verify the regression, downgrade to
`apache-airflow-providers-cncf-kubernetes==10.11.0` — the same DAG will succeed.
### Anything else
**Affected authentication methods:** Any exec-based credential plugin that
produces short-lived tokens. This includes:
- AWS EKS (`aws eks get-token`) — tokens expire in ~15 minutes
- GKE with `gke-gcloud-auth-plugin` — tokens expire in ~60 minutes
- Any custom exec plugin with token expiration
**Not affected:** Static bearer tokens, client certificates, in-cluster
service account tokens (which are auto-rotated by the kubelet).
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]