hussein-awala commented on issue #33402:
URL: https://github.com/apache/airflow/issues/33402#issuecomment-1679565485

   I have two questions:
   1. I see in the log:
      ```
      Deleted pod: TaskInstanceKey(dag_id='SAP_WMD_AGWI_RSRVD_TIMEOUT_H', 
task_id='SAP_WMD_AGWI_RSRVD_TIMEOUT_H', 
run_id='scheduled__2023-08-15T10:05:00+00:00', try_number=1, map_index=-1) in 
namespace airflow-test-ns
      ```
      Was this pod actually deleted by the executor? (you can check a new task 
if you don't have the information)
   2. If the answer to my first question is yes, then we can conclude that the 
problem is partial, and the executor is removing most of the pods. So the 
second question, are the undeleted pods completed within the same time range of 
the `KubernetesJobWatcher` exception (before or after a minute of this 
exception).
   
   My guess is that we lose some events when the job watcher fails/restarts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to