seelmann edited a comment on issue #5420: [AIRFLOW-4797] Fix zombie detection URL: https://github.com/apache/airflow/pull/5420#issuecomment-502779460 Yes, correct, the `_find_zombies()` function returns zombies for all the DAGs (or an empty list within the 10 second window). But the caller of this function https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/utils/dag_processing.py#L1217 then only starts processors for `n` DAG files which receive the list of zombies, subsequent processors for other DAG files just get an empty list. The list of (all or none) zombies is passed down via `DagFileProcessor` and `SchedulerJob.process_file()` to `DagBag.kill_zombies()` https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/models/dagbag.py#L271 which then checks each zombie if it belongs to the DAG and kills it. This is far too complex for such a simple thing like detecting zombie task instances and kill them. Last Friday I debugged 5 hours to find the reason. I thought about if it's not better to remove the zombie detection from `DagFileProcessorManager` and all the passing the list around and just implement the query within `DagBag.kill_zombies()`, which can only search for it's own DAGs and there a 10 seconds delay makes sense. WDYT?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
