kushsharma opened a new issue #20249:
URL: https://github.com/apache/airflow/issues/20249


   ### Apache Airflow version
   
   2.2.1
   
   ### What happened
   
   We have an airflow instance for approximately 6k DAGs. 
   - If we delete a DAG from UI, the UI times out
   - If we delete a DAG from CLI, it completes but sometimes takes up to a 
half-hour to finish.
   
   Most of the execution time appears to be consumed in database queries. I 
know I can just throw more CPU and memory to the db instance and hope it works 
but I think we can do better during delete operation. Correct me if I am wrong 
but I think this is the code that gets executed when deleting a DAG from UI or 
CLI via `delete_dag.py`
   ```python
       for model in models.base.Base._decl_class_registry.values():
           if hasattr(model, "dag_id"):
               if keep_records_in_log and model.__name__ == 'Log':
                   continue
               cond = or_(model.dag_id == dag_id, model.dag_id.like(dag_id + 
".%"))
               count += 
session.query(model).filter(cond).delete(synchronize_session='fetch')
       if dag.is_subdag:
           parent_dag_id, task_id = dag_id.rsplit(".", 1)
           for model in TaskFail, models.TaskInstance:
               count += (
                   session.query(model).filter(model.dag_id == parent_dag_id, 
model.task_id == task_id).delete()
               )
   ```
   
   I see we are iterating over all the models and doing a `dag_id` match. Some 
of the tables don't have an index over `dag_id` column like `job` which is 
making this operation really slow. This could be one easy fix for this issue.
   
   For example, the following query took 20 mins to finish in 16cpu 32gb 
Postgres instance:
   ```
   SELECT job.id AS job_id FROM job WHERE job.dag_id = $1 OR job.dag_id LIKE $2
   ```
   This is just one of the many queries that are being executed during the 
delete operation.
   
   ### What you expected to happen
   
   Deletion of DAG should not take this much time.
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   nix
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to