aru-trackunit opened a new issue, #39200:
URL: https://github.com/apache/airflow/issues/39200

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.8.4
   
   ### What happened?
   
   Airflow schedules tasks to be performed on a Kubernetes Cluster. However for 
some reason when the task is completed pods are not cleared out (as it is 
normal). I am not sure if the bug is on the airflow side or 
airflow-kubernetes-provider
   
   <img width="1675" alt="Screenshot 2024-04-22 at 11 48 03" 
src="https://github.com/apache/airflow/assets/93520526/6fa4c8b2-d529-4c68-a011-4bb1711f3686";>
   
   This causes system malfunctioning because Airflow still thinks that tasks 
are running, but they have completed. Looks like kubernetes is not reporting 
the state back to Airflow and then airflow executor is running out of open 
slots. 
   
   <img width="1701" alt="Screenshot 2024-04-22 at 13 58 24" 
src="https://github.com/apache/airflow/assets/93520526/3a2d3e9b-cb82-4851-94f4-1f4b2dae1e7c";>
   
   <img width="1715" alt="Screenshot 2024-04-22 at 13 58 58" 
src="https://github.com/apache/airflow/assets/93520526/2c705d27-ca07-43be-951e-ea3f50e5eeaf";>
   
   Also tasks had been queued up in the scheduled state and could not be 
promoted to queued state.
   <img width="1696" alt="Screenshot 2024-04-22 at 11 49 51" 
src="https://github.com/apache/airflow/assets/93520526/162a3774-2c0c-4871-85a5-7154cbd2044b";>
   
   At 10:24 airflow-scheduler has been restarted. `core.parallelism` is set to 
32, 
   
   Marked execution shows that airflow-scheduler catches up those tasks after 
restart. 5th column is displaying `start date time` and 6th `end date time`. 
From the graph one can assume that the job usually takes up to 2 minutes rather 
than an hour.
   <img width="1328" alt="Screenshot 2024-04-22 at 14 00 07" 
src="https://github.com/apache/airflow/assets/93520526/7d8b3924-756a-4b83-986f-48e24dbb82eb";>
   
   We enabled debug logs on scheduler. So when it happens next time we 
hopefully will know more.
   
   ### What you think should happen instead?
   
   Airflow tasks should continue running.
   
   ### How to reproduce
   
   I could not reproduce the issue. But in last 3 weeks it happened 5 times on 
our production system. We suspect that it started breaking for us when we 
upgraded `apache-airflow-providers-cncf-kubernetes` from `7.13.0` to `8.0.1` 
and keeps breaking on `8.1.1` as well. 
   
   Probably related to:
   https://github.com/apache/airflow/issues/36998
   https://github.com/apache/airflow/issues/33402
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   
[apache-airflow-providers-amazon](https://airflow.apache.org/docs/apache-airflow-providers-amazon/8.20.0)
    8.20.0  Amazon integration (including [Amazon Web Services 
(AWS)](https://aws.amazon.com/)).
   
[apache-airflow-providers-cncf-kubernetes](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/8.1.1)
   8.1.1   [Kubernetes](https://kubernetes.io/)
   
[apache-airflow-providers-common-io](https://airflow.apache.org/docs/apache-airflow-providers-common-io/1.3.0)
       1.3.0   ``Common IO Provider``
   
[apache-airflow-providers-common-sql](https://airflow.apache.org/docs/apache-airflow-providers-common-sql/1.11.1)
    1.11.1  [Common SQL Provider](https://en.wikipedia.org/wiki/SQL)
   
[apache-airflow-providers-databricks](https://airflow.apache.org/docs/apache-airflow-providers-databricks/6.2.0)
     6.2.0   [Databricks](https://databricks.com/)
   
[apache-airflow-providers-ftp](https://airflow.apache.org/docs/apache-airflow-providers-ftp/3.7.0)
   3.7.0   [File Transfer Protocol (FTP)](https://tools.ietf.org/html/rfc114)
   
[apache-airflow-providers-github](https://airflow.apache.org/docs/apache-airflow-providers-github/2.5.1)
     2.5.1   [GitHub](https://www.github.com/)
   
[apache-airflow-providers-google](https://airflow.apache.org/docs/apache-airflow-providers-google/10.17.0)
   10.17.0         Google services including: - [Google 
Ads](https://ads.google.com/) - [Google Cloud (GCP)](https://cloud.google.com/) 
- [Google Firebase](https://firebase.google.com/) - [Google 
LevelDB](https://github.com/google/leveldb/) - [Google Marketing 
Platform](https://marketingplatform.google.com/) - [Google 
Workspace](https://workspace.google.com/) (formerly Google Suite)
   
[apache-airflow-providers-hashicorp](https://airflow.apache.org/docs/apache-airflow-providers-hashicorp/3.6.4)
       3.6.4   Hashicorp including [Hashicorp 
Vault](https://www.vaultproject.io/)
   
[apache-airflow-providers-http](https://airflow.apache.org/docs/apache-airflow-providers-http/4.10.0)
        4.10.0  [Hypertext Transfer Protocol 
(HTTP)](https://www.w3.org/Protocols/)
   
[apache-airflow-providers-imap](https://airflow.apache.org/docs/apache-airflow-providers-imap/3.5.0)
         3.5.0   [Internet Message Access Protocol 
(IMAP)](https://tools.ietf.org/html/rfc3501)
   
[apache-airflow-providers-mysql](https://airflow.apache.org/docs/apache-airflow-providers-mysql/5.5.4)
       5.5.4   [MySQL](https://www.mysql.com/)
   
[apache-airflow-providers-postgres](https://airflow.apache.org/docs/apache-airflow-providers-postgres/5.10.2)
        5.10.2  [PostgreSQL](https://www.postgresql.org/)
   
[apache-airflow-providers-sftp](https://airflow.apache.org/docs/apache-airflow-providers-sftp/4.9.1)
         4.9.1   [SSH File Transfer Protocol 
(SFTP)](https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/)
   
[apache-airflow-providers-smtp](https://airflow.apache.org/docs/apache-airflow-providers-smtp/1.6.1)
         1.6.1   [Simple Mail Transfer Protocol 
(SMTP)](https://tools.ietf.org/html/rfc5321)
   
[apache-airflow-providers-snowflake](https://airflow.apache.org/docs/apache-airflow-providers-snowflake/5.4.0)
       5.4.0   [Snowflake](https://www.snowflake.com/)
   
[apache-airflow-providers-sqlite](https://airflow.apache.org/docs/apache-airflow-providers-sqlite/3.7.1)
     3.7.1   [SQLite](https://www.sqlite.org/)
   
[apache-airflow-providers-ssh](https://airflow.apache.org/docs/apache-airflow-providers-ssh/3.10.1)
  3.10.1  [Secure Shell (SSH)](https://tools.ietf.org/html/rfc4251)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   We deployed airflow on a kubernetes cluster using `KubernetesExecutor` 
setting in a helm chart.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to