[ https://issues.apache.org/jira/browse/AIRFLOW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953657#comment-16953657 ]
ASF subversion and git services commented on AIRFLOW-4797: ---------------------------------------------------------- Commit b0ec8716f0ecddd0cb6621bc981ccba12e74cabb in airflow's branch refs/heads/v1-10-stable from Kevin Yang [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=b0ec871 ] [AIRFLOW-4797] Use same zombies in all DAG file processors (cherry picked from commit cb0dbe309b518813529ddf7545ae942e5767f5e5) > Zombie detection and killing is not deterministic > ------------------------------------------------- > > Key: AIRFLOW-4797 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4797 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: 1.10.3 > Reporter: Stefan Seelmann > Assignee: Stefan Seelmann > Priority: Major > Fix For: 1.10.4 > > > Zombie detection and killing is done within the DAG file processing loop. > Within one iteration only a subset of the DAG files are processed (config > scheduler.max_threads). The loop sleeps for the rest of the second, until the > next iteration runs which processes the next subset of DAG files. The > function to get zombie task instancs only returns zombies once within 10 > seconds, otherwise an empty list is returned. > That means only in every 10th iteration of the DAG file processing loop > zombies are detected. And only if the zombie task belong to one of the DAG > files of the current iteration they are killed. > We run into the worst case scenario with max_threads=2 and 20 DAGs. In such a > scenario only zombies of the same 2 DAGs are killed. (as loop iterations are > not exactly 1s it shifts slowly and eventually the zomies are killed, but in > one example it took 33 minutes). -- This message was sent by Atlassian Jira (v8.3.4#803005)