[ 
https://issues.apache.org/jira/browse/AIRFLOW-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Zhang closed AIRFLOW-1329.
-------------------------------
    Resolution: Invalid

> Problematic DAG cause worker queue saturated
> --------------------------------------------
>
>                 Key: AIRFLOW-1329
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1329
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Zhen Zhang
>
> We see this weird issue in our production airflow cluster:
> # User has a problematic import statement in DAG definition.
> # For some still unknown reasons, our scheduler and workers have different 
> PYTHONPATH settings such that the scheduler is able to parse the DAG 
> successfully, but the workers fails on import.
> # What we observed is that, on the worker side, all the tasks in the 
> problematic DAG are in "queued" state, while on the scheduler side, the 
> scheduler keeps requeue hundreds of thousands of duplicated tasks. As a 
> result, it quickly saturates the worker queue and blocks normal tasks to run. 
> I think a better way to handle this would be either mark the user task as 
> failed, or the scheduler has some rate limit in requeueing duplicated tasks, 
> and isolates user errors/problematic workers from the core Airflow cluster 
> functionality.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to