droppoint commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1833844894
> Hi @droppoint let us know what you find My team and I ran an experiment that demonstrated that even if the scheduler shuts down abnormally, the TaskInstance still completes normally. This observation also applies to DagRun and LocalTaskJob of this TaskInstance. TaskInstance completes normally because state of it changes within the [_run_raw_task](https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py#L2292) function from within the worker pod. Here's a step-by-step breakdown of our experiment: 0. Set the number of schedulers in the namespace to 2. 1. Create a DAG that sleeps for 5 minutes. 2. Set orphaned_tasks_check_interval to 20 minutes. 3. Run the DAG on scheduler №1. 4. Wait until DAGRun/Job/TaskInstance/Pod is in the "Running" state. 5. Kill scheduler №1 and prevent its restart. 6. Wait until the pod is in the Completed state. 7. Wait until adoption starts on scheduler №2. 8. Wait until the cleanup-pods cronjob starts. Results: - TaskInstance/DAGRun/Job status changed to "success" after step 6 but before step 7. - The pod was deleted only after step 8. So, pods that were completed after a scheduler's abnormal shutdown do not lead to TaskInstance/DagRun/Job failure, even if they were not "adopted." While the pod was deleted after step 8 by the cleanup-pods cronjob, I understand the concern raised by @JCoder01 that we need to clean up pods properly even in this case. In the next step, we'll attempt to implement a new version of the _adopt_completed_pods function that retrieves IDs of working SchedulerJobs and deletes all pods in the Completed state that don't belong to "running" SchedulerJobs, as @dstandish suggested. We'll test this solution on our Airflow setup and provide more information approximately next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
