[ 
https://issues.apache.org/jira/browse/AIRFLOW-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994808#comment-16994808
 ] 

ASF GitHub Bot commented on AIRFLOW-5660:
-----------------------------------------

ashb commented on pull request #6340: [AIRFLOW-5660] Attempt to find the task 
in DB from Kubernetes pod labels
URL: https://github.com/apache/airflow/pull/6340
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Scheduler becomes unresponsive when processing large DAGs on kubernetes.
> ------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5660
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5660
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor-kubernetes
>    Affects Versions: 1.10.5
>            Reporter: Aditya Vishwakarma
>            Assignee: Daniel Imberman
>            Priority: Major
>             Fix For: 1.10.7
>
>
> For very large dags( 10,000+) and high parallelism, the scheduling loop can 
> take more 5-10 minutes. 
> It seems that `_labels_to_key` function in kubernetes_executor loads all 
> tasks with a given execution date into memory. It does it for every task in 
> progress. So, if 100 tasks are in progress of a dag with 10,000 tasks, it 
> will load million tasks on every tick of the scheduler from db.
> [https://github.com/apache/airflow/blob/caf1f264b845153b9a61b00b1a57acb7c320e743/airflow/contrib/executors/kubernetes_executor.py#L598]
> A quick fix is to search for task in the db directly before regressing to 
> full scan. I can submit a PR for it.
> A proper fix requires persisting a mapping of (safe_dag_id, safe_task_id, 
> dag_id, task_id, execution_date) somewhere, probably in the metadatabase.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to