[ 
https://issues.apache.org/jira/browse/AIRFLOW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953657#comment-16953657
 ] 

ASF subversion and git services commented on AIRFLOW-4797:
----------------------------------------------------------

Commit b0ec8716f0ecddd0cb6621bc981ccba12e74cabb in airflow's branch 
refs/heads/v1-10-stable from Kevin Yang
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=b0ec871 ]

[AIRFLOW-4797] Use same zombies in all DAG file processors

(cherry picked from commit cb0dbe309b518813529ddf7545ae942e5767f5e5)


> Zombie detection and killing is not deterministic
> -------------------------------------------------
>
>                 Key: AIRFLOW-4797
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4797
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.3
>            Reporter: Stefan Seelmann
>            Assignee: Stefan Seelmann
>            Priority: Major
>             Fix For: 1.10.4
>
>
> Zombie detection and killing is done within the DAG file processing loop. 
> Within one iteration only a subset of the DAG files are processed (config 
> scheduler.max_threads). The loop sleeps for the rest of the second, until the 
> next iteration runs which processes the next subset of DAG files. The 
> function to get zombie task instancs only returns zombies once within 10 
> seconds, otherwise an empty list is returned.
> That means only in every 10th iteration of the DAG file processing loop 
> zombies are detected. And only if the zombie task belong to one of the DAG 
> files of the current iteration they are killed.
> We run into the worst case scenario with max_threads=2 and 20 DAGs. In such a 
> scenario only zombies of the same 2 DAGs are killed. (as loop iterations are 
> not exactly 1s it shifts slowly and eventually the zomies are killed, but in 
> one example it took 33 minutes).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to