anon-airflow opened a new issue, #44127:
URL: https://github.com/apache/airflow/issues/44127
### Apache Airflow version
Other Airflow 2 version (please specify below)
### If "Other Airflow 2 version" selected, which one?
2.8.2
### What happened?
I get the following error a lot on my airflow scheduler pods:
```
{kubernetes_executor_utils.py:121} ERROR - Unknown error in
KubernetesJobWatcher. Failing
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 112, in run
self.resource_version = self._run(
^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 168, in _run
for event in self._pod_events(kube_client=kube_client,
query_kwargs=kwargs):
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py",
line 182, in stream
raise client.rest.ApiException(
kubernetes.client.exceptions.ApiException: (410)
Reason: Expired: too old resource version: 725263658 (725300129)
Process KubernetesJobWatcher-8:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/multiprocessing/process.py", line 314, in
_bootstrap
self.run()
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 112, in run
self.resource_version = self._run(
^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 168, in _run
for event in self._pod_events(kube_client=kube_client,
query_kwargs=kwargs):
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py",
line 182, in stream
raise client.rest.ApiException(
kubernetes.client.exceptions.ApiException: (410)
Reason: Expired: too old resource version: 725263658 (725300129)
```
When this error appears relatively many times on my airflow scheduler pods,
All my DAG runs become very slow- This is expressed in the fact that the amount
of my "scheduled" slots is very high and in contrast the amount of my "'queued"
and "running" slots is very low (about 15 slots together) even though I have
defined 128 slots.
Also my resource utilization in my namespace is very low (20% cpu and memory
usage) so the problem is not resources either.
NOTE: I use the package "apache-airflow-providers-cncf-kubernetes" on
version 8.0.0 as required for Airflow 2.8.2 according to the constraints.
### What you think should happen instead?
I think Airflow should know how to handle this error so that even when the
error is thrown, the scheduler should continue to work properly and not
"freeze".
### How to reproduce
I think it would happen on any deployment in this version of Airflow with
running DAGs.
### Operating System
rhel 8
### Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==8.0.0
### Deployment
Other
### Deployment details
We are in a private cloud with constraints, we took the most of the chart
but handled the constraints ourselves.
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]