[
https://issues.apache.org/jira/browse/AIRFLOW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ash Berlin-Taylor updated AIRFLOW-4797:
---------------------------------------
Fix Version/s: 1.10.4
> Zombie detection and killing is not deterministic
> -------------------------------------------------
>
> Key: AIRFLOW-4797
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4797
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.10.3
> Reporter: Stefan Seelmann
> Assignee: Stefan Seelmann
> Priority: Major
> Fix For: 1.10.4
>
>
> Zombie detection and killing is done within the DAG file processing loop.
> Within one iteration only a subset of the DAG files are processed (config
> scheduler.max_threads). The loop sleeps for the rest of the second, until the
> next iteration runs which processes the next subset of DAG files. The
> function to get zombie task instancs only returns zombies once within 10
> seconds, otherwise an empty list is returned.
> That means only in every 10th iteration of the DAG file processing loop
> zombies are detected. And only if the zombie task belong to one of the DAG
> files of the current iteration they are killed.
> We run into the worst case scenario with max_threads=2 and 20 DAGs. In such a
> scenario only zombies of the same 2 DAGs are killed. (as loop iterations are
> not exactly 1s it shifts slowly and eventually the zomies are killed, but in
> one example it took 33 minutes).
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)