[
https://issues.apache.org/jira/browse/AIRFLOW-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhen Zhang closed AIRFLOW-1329.
-------------------------------
Resolution: Invalid
> Problematic DAG cause worker queue saturated
> --------------------------------------------
>
> Key: AIRFLOW-1329
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1329
> Project: Apache Airflow
> Issue Type: Improvement
> Components: scheduler
> Reporter: Zhen Zhang
>
> We see this weird issue in our production airflow cluster:
> # User has a problematic import statement in DAG definition.
> # For some still unknown reasons, our scheduler and workers have different
> PYTHONPATH settings such that the scheduler is able to parse the DAG
> successfully, but the workers fails on import.
> # What we observed is that, on the worker side, all the tasks in the
> problematic DAG are in "queued" state, while on the scheduler side, the
> scheduler keeps requeue hundreds of thousands of duplicated tasks. As a
> result, it quickly saturates the worker queue and blocks normal tasks to run.
> I think a better way to handle this would be either mark the user task as
> failed, or the scheduler has some rate limit in requeueing duplicated tasks,
> and isolates user errors/problematic workers from the core Airflow cluster
> functionality.
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)