paramjeet01 commented on issue #39096: URL: https://github.com/apache/airflow/issues/39096#issuecomment-2118726874
I believe I have identified the cause of the issue: We are using AWS Spot EC2 instances for the workloads in Airflow. When a spot instance is terminated, the pod enters a terminating state for around 2 minutes. During the second retry, the pod is rescheduled, and the [find_pod](https://github.com/apache/airflow/blob/2.8.3/airflow/providers/cncf/kubernetes/operators/pod.py#L535) method is used to retrieve the pod based on the labels, which results in the following error: ``` [2024-04-18, 01:32:20 IST] {pod.py:1109} ERROR - 'NoneType' object has no attribute 'metadata' Traceback (most recent call last): File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line 153, in execute self.remote_pod = self.find_pod(self.pod.metadata.namespace, context=context) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 523, in find_pod raise AirflowException(f"More than one pod running with labels {label_selector}") airflow.exceptions.AirflowException: More than one pod running with labels {**** our labels *****} ``` At this point, we have a pod in a terminating state and a new pod created by the second retry. When the [cleanup](https://github.com/apache/airflow/blob/2.8.3/airflow/providers/cncf/kubernetes/operators/pod.py#L633) method is called, it encounters another error because the find_pod method did not return anything due to the exception: ``` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 937, in patch_already_checked name=pod.metadata.name, AttributeError: 'NoneType' object has no attribute 'metadata' ``` After every retry a new pod is created and not cleaned up which loops forever. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
