arjunanan6 commented on issue #33402: URL: https://github.com/apache/airflow/issues/33402#issuecomment-1680258882
> I have two questions: > > 1. I see in the log: > ``` > Deleted pod: TaskInstanceKey(dag_id='SAP_WMD_AGWI_RSRVD_TIMEOUT_H', task_id='SAP_WMD_AGWI_RSRVD_TIMEOUT_H', run_id='scheduled__2023-08-15T10:05:00+00:00', try_number=1, map_index=-1) in namespace airflow-test-ns > ``` > > > > > > > > > > > > Was this pod actually deleted by the executor? (you can check a new task if you don't have the information) > 2. If the answer to my first question is yes, then we can conclude that the problem is partial, and the executor is removing most of the pods. So the second question, are the undeleted pods completed within the same time range of the `KubernetesJobWatcher` exception (before or after a minute of this exception). > > My guess is that we lose some events when the job watcher fails/restarts. Yes, the pod was actually deleted in that instance - same goes for all the other "Deleted pod" log messages. It just stops removing pods at a certain point after that. For example in that log from earlier, it last deleted: "IT_ERPBUT_GRAC_ACTION_USAGE_SYNC_H", but then it fails to remove pods starting from "IT_ERPBUT_GRAC_BATCH_RISK_ANALYSIS_H". If I were to kill the scheduler now, it will remove all Completed pods that it previously couldn't until KubernetesJobWatcher dies again at some point (typically 20-30 min after launch from my observation). I prefer not to change `AIRFLOW__KUBERNETES_EXECUTOR__DELETE_WORKER_PODS_ON_FAILURE` because I do want to see errored out pods. More specifically so because the tasks are successfully executed - it's just the `Completed` pods stop getting removed. I have a cronjob that removes Completed pods every hour now that's acting as a bandaid fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
