[jira] [Commented] (AIRFLOW-5660) Scheduler becomes unresponsive when processing large DAGs on kubernetes.

Ash Berlin-Taylor (Jira) Thu, 12 Dec 2019 03:51:49 -0800


    [ 
https://issues.apache.org/jira/browse/AIRFLOW-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994581#comment-16994581
 ]


Ash Berlin-Taylor commented on AIRFLOW-5660:
--------------------------------------------

[~adivish] Are you able to share your dag file with us? (Most of the tasks can 
be replaced with PythonOperator/DummyOperator, as we don't need those, just the 
structure) And you might want to check out 
[https://github.com/apache/airflow/pull/6792] which makes the DagFileProcessor 
about 2x quicker in my testing.

> Scheduler becomes unresponsive when processing large DAGs on kubernetes.
> ------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5660
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5660
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor-kubernetes
>    Affects Versions: 1.10.5
>            Reporter: Aditya Vishwakarma
>            Assignee: Daniel Imberman
>            Priority: Major
>             Fix For: 1.10.7
>
>
> For very large dags( 10,000+) and high parallelism, the scheduling loop can 
> take more 5-10 minutes. 
> It seems that `_labels_to_key` function in kubernetes_executor loads all 
> tasks with a given execution date into memory. It does it for every task in 
> progress. So, if 100 tasks are in progress of a dag with 10,000 tasks, it 
> will load million tasks on every tick of the scheduler from db.
> [https://github.com/apache/airflow/blob/caf1f264b845153b9a61b00b1a57acb7c320e743/airflow/contrib/executors/kubernetes_executor.py#L598]
> A quick fix is to search for task in the db directly before regressing to 
> full scan. I can submit a PR for it.
> A proper fix requires persisting a mapping of (safe_dag_id, safe_task_id, 
> dag_id, task_id, execution_date) somewhere, probably in the metadatabase.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (AIRFLOW-5660) Scheduler becomes unresponsive when processing large DAGs on kubernetes.

Reply via email to