antoniocorralsierra opened a new issue, #36471:
URL: https://github.com/apache/airflow/issues/36471

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.7.2
   
   ### What happened?
   
   I have a DAG where there are some tasks using KubernetesPodOperator and 
running in parallel. The tasks are configurate with 5 retries.
   
   When I mark DagRun as failed from UI (calling /dagrun_failed) or using the 
API endpoint 
(https://airflow.apache.org/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}) if the 
tasks are running, they are marked as failed but after that their status change 
to up_for_retry.
   
   The task log is the following:
   `[2023-12-28, 10:21:43 UTC] {pod_manager.py:351} WARNING - Pod not yet 
started: clientes-wog3wa6y`
   `[2023-12-28, 10:21:43 UTC] {taskinstance.py:844} DEBUG - Refreshing 
TaskInstance <TaskInstance: 10002860.Extraction.clientes 
manual__2023-12-27T11:17:15.812101+00:00 [failed]> from DB`
   `[2023-12-28, 10:21:43 UTC] {local_task_job_runner.py:294} WARNING - State 
of this instance has been externally set to failed. Terminating instance.`
   `[2023-12-28, 10:21:43 UTC] {job.py:216} DEBUG - [heartbeat]`
   `[2023-12-28, 10:21:43 UTC] {process_utils.py:131} INFO - Sending 15 to 
group 632. PIDs of all processes in the group: [632]`
   `[2023-12-28, 10:21:43 UTC] {process_utils.py:86} INFO - Sending the signal 
15 to group 632`
   `[2023-12-28, 10:21:43 UTC] {taskinstance.py:1632} ERROR - Received SIGTERM. 
Terminating subprocesses.`
   `[2023-12-28, 10:21:46 UTC] {taskinstance.py:1937} ERROR - Task failed with 
exception`
   `Traceback (most recent call last):`
    `File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 597, in 
     execute_sync`
   `self.await_pod_start(pod=self.pod)`
   `File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 358, in await_pod_start`
   `time.sleep(1)`
   `File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 1634, in signal_handler`
   `raise AirflowException("Task received SIGTERM signal")`
   `During handling of the above exception, another exception occurred:`
   `Traceback (most recent call last):`
   `File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 714, in cleanup`
   `istio_enabled = self.is_istio_enabled(remote_pod)`
   `kubernetes.client.exceptions.ApiException: (404)`
   `Reason: Not Found`
   
   The problem happen when the task is mark as failed while the 
kubernetesPodOperator is waiting that the pod reach other phase than 
**Pending**.
   
   The same behaviour is seen when a unique task, using KubernetesPodOperator, 
is mark as failed while it is running but the pod is in pending status.
   
   PD: I use CeleryKubernetesExecutor and the tasks are running on 
CeleryExecutor.
   
   ### What you think should happen instead?
   
   When I mark DagRun or task as failed manually, it should fail without 
retries.
   
   ### How to reproduce
   
   There are two ways:
   
   1. Setting failed state manually from UI to a task that uses 
KubernetesPodOperator and is running but the pod of the task is in Pending 
status.
   2. Setting DagRun as failed from UI (calling /dagrun_failed) or using the 
API endpoint 
(https://airflow.apache.org/api/v1/dags/{dag_id}/dagRuns/{dag_run_id})
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-celery          3.3.4`
   `apache-airflow-providers-cncf-kubernetes 7.6.0`
   `apache-airflow-providers-redis           3.3.2`
   
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to