kushsharma opened a new issue #20249:
URL: https://github.com/apache/airflow/issues/20249
### Apache Airflow version
2.2.1
### What happened
We have an airflow instance for approximately 6k DAGs.
- If we delete a DAG from UI, the UI times out
- If we delete a DAG from CLI, it completes but sometimes takes up to a
half-hour to finish.
Most of the execution time appears to be consumed in database queries. I
know I can just throw more CPU and memory to the db instance and hope it works
but I think we can do better during delete operation. Correct me if I am wrong
but I think this is the code that gets executed when deleting a DAG from UI or
CLI via `delete_dag.py`
```python
for model in models.base.Base._decl_class_registry.values():
if hasattr(model, "dag_id"):
if keep_records_in_log and model.__name__ == 'Log':
continue
cond = or_(model.dag_id == dag_id, model.dag_id.like(dag_id +
".%"))
count +=
session.query(model).filter(cond).delete(synchronize_session='fetch')
if dag.is_subdag:
parent_dag_id, task_id = dag_id.rsplit(".", 1)
for model in TaskFail, models.TaskInstance:
count += (
session.query(model).filter(model.dag_id == parent_dag_id,
model.task_id == task_id).delete()
)
```
I see we are iterating over all the models and doing a `dag_id` match. Some
of the tables don't have an index over `dag_id` column like `job` which is
making this operation really slow. This could be one easy fix for this issue.
For example, the following query took 20 mins to finish in 16cpu 32gb
Postgres instance:
```
SELECT job.id AS job_id FROM job WHERE job.dag_id = $1 OR job.dag_id LIKE $2
```
This is just one of the many queries that are being executed during the
delete operation.
### What you expected to happen
Deletion of DAG should not take this much time.
### How to reproduce
_No response_
### Operating System
nix
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other Docker-based deployment
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]