[ 
https://issues.apache.org/jira/browse/AIRFLOW-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098819#comment-17098819
 ] 

Timur Rubeko edited comment on AIRFLOW-5506 at 5/4/20, 10:00 AM:
-----------------------------------------------------------------

I'm not sure if all the comments here relate to the same issue, it appears that 
there are various scenarios here.

I do confirm though that I also observe the issue (scheduler logs multiple 
"Killing PID XXXX" messages) in the case of a LocalExecutor and a PostgreSQL 
backend. It is reproduced even with a single DAG and single task. After "some 
time" the scheduler starts spilling the "Killing PID XXXX" messages and stops 
executing the tasks.

I confirm that removing SLA from the DAG "fixes" (works around) the issue for 
me.

My configuration: Airflow 1.10.9, Airflow scheduler runs in a k8s pod, 
PostgreSQL backend, LocalExecutor


was (Author: trubeko):
I'm not sure if all the comments here relate to the same issue, it appears that 
there are various scenarios here.

I do confirm though that I also observe the issue (scheduler logs multiple 
"Killing PID XXXX" messages) in the case of a LocalExecutor and a PostgreSQL 
backend. It is reproduced even with a single DAG and single task when running 
the scheduler itself as a Kubernetes pod. After "some time" the scheduler 
starts spilling the "Killing PID XXXX" messages and stops executing the tasks.

I confirm that removing SLA from the DAG "fixes" (works around) the issue for 
me.

> Airflow scheduler stuck
> -----------------------
>
>                 Key: AIRFLOW-5506
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5506
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.4, 1.10.5, 1.10.6
>            Reporter: t oo
>            Priority: Major
>
> re-post of 
> [https://stackoverflow.com/questions/57713394/airflow-scheduler-stuck] and 
> slack discussion
>  
>  
> I'm testing the use of Airflow, and after triggering a (seemingly) large 
> number of DAGs at the same time, it seems to just fail to schedule anything 
> and starts killing processes. These are the logs the scheduler prints:
> {{[2019-08-29 11:17:13,542] \{scheduler_job.py:214} WARNING - Killing PID 
> 199809
> [2019-08-29 11:17:13,544] \{scheduler_job.py:214} WARNING - Killing PID 199809
> [2019-08-29 11:17:44,614] \{scheduler_job.py:214} WARNING - Killing PID 2992
> [2019-08-29 11:17:44,614] \{scheduler_job.py:214} WARNING - Killing PID 2992
> [2019-08-29 11:18:15,692] \{scheduler_job.py:214} WARNING - Killing PID 5174
> [2019-08-29 11:18:15,693] \{scheduler_job.py:214} WARNING - Killing PID 5174
> [2019-08-29 11:18:46,765] \{scheduler_job.py:214} WARNING - Killing PID 22410
> [2019-08-29 11:18:46,766] \{scheduler_job.py:214} WARNING - Killing PID 22410
> [2019-08-29 11:19:17,845] \{scheduler_job.py:214} WARNING - Killing PID 42177
> [2019-08-29 11:19:17,846] \{scheduler_job.py:214} WARNING - Killing PID 42177
> ...}}
> I'm using a LocalExecutor with a PostgreSQL backend DB. It seems to be 
> happening only after I'm triggering a large number (>100) of DAGs at about 
> the same time using external triggering. As in:
> {{airflow trigger_dag DAG_NAME}}
> After waiting for it to finish killing whatever processes he is killing, he 
> starts executing all of the tasks properly. I don't even know what these 
> processes were, as I can't really see them after they are killed...
> Did anyone encounter this kind of behavior? Any idea why would that happen?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to