paramjeet01 opened a new issue, #39096:
URL: https://github.com/apache/airflow/issues/39096

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.8.3
   
   ### What happened?
   
   The airflow terminated the task, scheduling it for a retry. However, during 
the subsequent retry attempt, an error occurred indicating that the pod with 
identical labels still persisted. Upon inspection, I found the pods were still 
active from the initial attempt.
   
   **First attempt error log:**
   ```
   [2024-04-18, 01:30:40 IST] {local_task_job_runner.py:121} ERROR - Received 
SIGTERM. Terminating subprocesses
   [2024-04-18, 01:30:40 IST] {process_utils.py:131} INFO - Sending 15 to group 
125. PIDs of all processes in the group: [125]
   [2024-04-18, 01:30:40 IST] {process_utils.py:86} INFO - Sending the signal 
15 to group 125
   [2024-04-18, 01:30:40 IST] {taskinstance.py:2483} ERROR - Received SIGTERM. 
Terminating subprocesses.
   [2024-04-18, 01:30:40 IST] {taskinstance.py:2731} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 
157, in execute
       self.pod_manager.fetch_requested_container_logs(
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 580, in fetch_requested_container_logs
       status = self.fetch_container_logs(pod=pod, container_name=c, 
follow=follow_logs)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 503, in fetch_container_logs
       last_log_time, exc = consume_logs(since_time=last_log_time)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 450, in consume_logs
       for raw_line in logs:
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 240, in __iter__
       for data_chunk in self.response.stream(amt=None, decode_content=True):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
933, in stream
       yield from self.read_chunked(amt, decode_content=decode_content)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
1073, in read_chunked
       self._update_chunk_length()
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
1001, in _update_chunk_length
       line = self._fp.fp.readline()  # type: ignore[union-attr]
     File "/usr/local/lib/python3.10/socket.py", line 705, in readinto
       return self._sock.recv_into(b)
     File "/usr/local/lib/python3.10/ssl.py", line 1307, in recv_into
       return self.read(nbytes, buffer)
     File "/usr/local/lib/python3.10/ssl.py", line 1163, in read
       return self._sslobj.read(len, buffer)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 2485, in signal_handler
       raise AirflowException("Task received SIGTERM signal")
   airflow.exceptions.AirflowException: Task received SIGTERM signal
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 439, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 414, in _execute_callable
       return execute_callable(context=context, **execute_callable_kwargs)
     File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 
184, in execute
       raise AirflowException(f'Pod Launching failed: {ex}')
   airflow.exceptions.AirflowException: Pod Launching failed: Task received 
SIGTERM signal
   [2024-04-18, 01:30:40 IST] {taskinstance.py:527} DEBUG - Task Duration set 
to 10.25225
   [2024-04-18, 01:30:40 IST] {taskinstance.py:549} DEBUG - Clearing 
next_method and next_kwargs.
   [2024-04-18, 01:30:40 IST] {taskinstance.py:1149} INFO - Marking task as 
UP_FOR_RETRY. 
   ```
   **Second attempt error log:**
   ```
   [2024-04-18, 01:32:20 IST] {pod.py:1109} ERROR - 'NoneType' object has no 
attribute 'metadata'
   Traceback (most recent call last):
     File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 
153, in execute
       self.remote_pod = self.find_pod(self.pod.metadata.namespace, 
context=context)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 523, in find_pod
       raise AirflowException(f"More than one pod running with labels 
{label_selector}")
   airflow.exceptions.AirflowException: More than one pod running with labels 
{**** our labels *****}
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 
184, in execute
       raise AirflowException(f'Pod Launching failed: {ex}')
   airflow.exceptions.AirflowException: Pod Launching failed: More than one pod 
running with labels {**** our labels *****}
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 937, in patch_already_checked
       name=pod.metadata.name,
   AttributeError: 'NoneType' object has no attribute 'metadata'
   [2024-04-18, 01:32:20 IST] {taskinstance.py:2731} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 
153, in execute
       self.remote_pod = self.find_pod(self.pod.metadata.namespace, 
context=context)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 523, in find_pod
       raise AirflowException(f"More than one pod running with labels 
{label_selector}")
   airflow.exceptions.AirflowException: More than one pod running with labels 
{**** our labels *****}
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 
184, in execute
       raise AirflowException(f'Pod Launching failed: {ex}')
   airflow.exceptions.AirflowException: Pod Launching failed: {**** our labels 
*****}
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 439, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 414, in _execute_callable
       return execute_callable(context=context, **execute_callable_kwargs)
     File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 
186, in execute
       self.cleanup(
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 839, in cleanup
       raise AirflowException(
   airflow.exceptions.AirflowException: Pod { *** our pod name ***} returned a 
failure.
   remote_pod: None
   [2024-04-18, 01:32:20 IST] {taskinstance.py:527} DEBUG - Task Duration set 
to 1.162314
   [2024-04-18, 01:32:20 IST] {taskinstance.py:549} DEBUG - Clearing 
next_method and next_kwargs.
   [2024-04-18, 01:32:20 IST] {taskinstance.py:1149} INFO - Marking task as 
UP_FOR_RETRY. 
   [2024-04-18, 01:32:20 IST] {plugins.py:178} INFO - Getting Refined Message
   [2024-04-18, 01:32:20 IST] {plugins.py:180} INFO - Message Payload Not 
Provided
   [2024-04-18, 01:32:20 IST] {logging_mixin.py:188} INFO - {'Airflow 
Exception': 'Pod Launching failed'}
   [2024-04-18, 01:32:20 IST] {plugins.py:183} INFO - Task message: {'Airflow 
Exception': 'Pod Launching failed'}
   
   ```
   
   
   ### What you think should happen instead?
   
   Once the SIGTERM Terminating subprocesses is issued to the task it should 
properly delete the pod.
   
   ### How to reproduce
   
   Let airflow kill your task with SIGTERM and on the next retry you'll face 
pod already exists with same labels
   
   ### Operating System
   
   Amazon Linux 2
   
   ### Versions of Apache Airflow Providers
   
   pytest>=6.2.5
   docker>=5.0.0
   crypto>=1.4.1
   cryptography>=3.4.7
   pyOpenSSL>=20.0.1
   ndg-httpsclient>=0.5.1
   boto3>=1.34.0
   sqlalchemy
   redis>=3.5.3
   requests>=2.26.0
   pysftp>=0.2.9
   werkzeug>=1.0.1
   apache-airflow-providers-cncf-kubernetes==8.0.0
   apache-airflow-providers-amazon>=8.13.0
   psycopg2>=2.8.5
   grpcio>=1.37.1
   grpcio-tools>=1.37.1
   protobuf>=3.15.8,<=3.21
   python-dateutil>=2.8.2
   jira>=3.1.1
   confluent_kafka>=1.7.0
   pyarrow>=10.0.1,<10.1.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Official helm chart deployment
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to