seelmann edited a comment on issue #5420: [AIRFLOW-4797] Fix zombie detection
URL: https://github.com/apache/airflow/pull/5420#issuecomment-502779460
 
 
   Yes, correct, the `_find_zombies()` function returns zombies for all the 
DAGs (or an empty list within the 10 second window). But the caller of this 
function 
https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/utils/dag_processing.py#L1217
 then only starts processors for `n` DAG files which receive the list of 
zombies, subsequent processors for other DAG files just get an empty list.
   
   The list of (all or none) zombies is passed down via `DagFileProcessor` and 
`SchedulerJob.process_file()` to `DagBag.kill_zombies()` 
https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/models/dagbag.py#L271
 which then checks each zombie if it belongs to the DAG and kills it.
   
   This is far too complex for such a simple thing like detecting zombie task 
instances and kill them. Last Friday I debugged 5 hours to find the reason.
   
   I thought about if it's not better to remove the zombie detection from 
`DagFileProcessorManager` and all the passing the list around and just 
implement the query within `DagBag.kill_zombies()`, which can only search for 
it's own DAGs and there a 10 seconds delay makes sense. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to