antoniocorralsierra opened a new issue, #36471: URL: https://github.com/apache/airflow/issues/36471
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.7.2 ### What happened? I have a DAG where there are some tasks using KubernetesPodOperator and running in parallel. The tasks are configurate with 5 retries. When I mark DagRun as failed from UI (calling /dagrun_failed) or using the API endpoint (https://airflow.apache.org/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}) if the tasks are running, they are marked as failed but after that their status change to up_for_retry. The task log is the following: `[2023-12-28, 10:21:43 UTC] {pod_manager.py:351} WARNING - Pod not yet started: clientes-wog3wa6y` `[2023-12-28, 10:21:43 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: 10002860.Extraction.clientes manual__2023-12-27T11:17:15.812101+00:00 [failed]> from DB` `[2023-12-28, 10:21:43 UTC] {local_task_job_runner.py:294} WARNING - State of this instance has been externally set to failed. Terminating instance.` `[2023-12-28, 10:21:43 UTC] {job.py:216} DEBUG - [heartbeat]` `[2023-12-28, 10:21:43 UTC] {process_utils.py:131} INFO - Sending 15 to group 632. PIDs of all processes in the group: [632]` `[2023-12-28, 10:21:43 UTC] {process_utils.py:86} INFO - Sending the signal 15 to group 632` `[2023-12-28, 10:21:43 UTC] {taskinstance.py:1632} ERROR - Received SIGTERM. Terminating subprocesses.` `[2023-12-28, 10:21:46 UTC] {taskinstance.py:1937} ERROR - Task failed with exception` `Traceback (most recent call last):` `File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 597, in execute_sync` `self.await_pod_start(pod=self.pod)` `File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 358, in await_pod_start` `time.sleep(1)` `File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 1634, in signal_handler` `raise AirflowException("Task received SIGTERM signal")` `During handling of the above exception, another exception occurred:` `Traceback (most recent call last):` `File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 714, in cleanup` `istio_enabled = self.is_istio_enabled(remote_pod)` `kubernetes.client.exceptions.ApiException: (404)` `Reason: Not Found` The problem happen when the task is mark as failed while the kubernetesPodOperator is waiting that the pod reach other phase than **Pending**. The same behaviour is seen when a unique task, using KubernetesPodOperator, is mark as failed while it is running but the pod is in pending status. PD: I use CeleryKubernetesExecutor and the tasks are running on CeleryExecutor. ### What you think should happen instead? When I mark DagRun or task as failed manually, it should fail without retries. ### How to reproduce There are two ways: 1. Setting failed state manually from UI to a task that uses KubernetesPodOperator and is running but the pod of the task is in Pending status. 2. Setting DagRun as failed from UI (calling /dagrun_failed) or using the API endpoint (https://airflow.apache.org/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}) ### Operating System Debian GNU/Linux 11 (bullseye) ### Versions of Apache Airflow Providers `apache-airflow-providers-celery 3.3.4` `apache-airflow-providers-cncf-kubernetes 7.6.0` `apache-airflow-providers-redis 3.3.2` ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
