jvstein opened a new issue, #61894:
URL: https://github.com/apache/airflow/issues/61894
### Apache Airflow version
3.1.7
### If "Other Airflow 3 version" selected, which one?
_No response_
### What happened?
I'm testing the upgrade of an Airflow 2.11.0 instance to 3.1.7 in Kubernetes
and was observing repeated restarts of the dag-processor during initial start
up due to failed liveness checks. After removing the liveness probe, the
processor was able to show the actual errors. The dag-processor was
successfully starting parses of the individual DAG files (up to the process
limit), but then each DAG parse would stall. The dag-processor would sit there
consuming 0% CPU resources and then eventually each DAG parse step would time
out with a message like this and then a new batch would start and the process
would repeat:
> Processor for DagFileInfo(rel_path=PosixPath('my/dag/path.py'),
bundle_name='dags-folder', bundle_path=PosixPath('/opt/airflow/dags'),
bundle_version=None) with PID 19 started 807 ago killing it.
During this, I observed a very slow query in the upgraded Airflow database.
> SELECT EXISTS (SELECT *
FROM task_instance
WHERE task_instance.dag_version_id =
'019c453a-63af-77d4-8a7f-edd1832cae1e'::UUID) AS anon_1
Our `task_instance` table has over 27M rows and the `dag_version_id` column
is not indexed, so this was taking tens of minutes to do a full table scan.
After patching our instance with this change, the dag-processor was able to
proceed.
https://github.com/apache/airflow/compare/3.1.7...jvstein:airflow:fix_slow_task_instance_query
### What you think should happen instead?
The query in question does a full table scan on a very large table. There
appears to be an available `dag_id` value in the function, which should be used
for filtering in addition to the `dag_version_id` value.
### How to reproduce
Start with a very large number of records in the `task_instance` table. Run
the dag-processor. The initial parsing of DAGs should be very slow to proceed.
### Operating System
Debian bookworm
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==9.21.0
apache-airflow-providers-apache-iceberg==1.4.1
apache-airflow-providers-celery==3.15.2
apache-airflow-providers-cncf-kubernetes==10.12.3
apache-airflow-providers-common-compat==1.13.0
apache-airflow-providers-common-io==1.7.1
apache-airflow-providers-common-sql==1.30.4
apache-airflow-providers-fab==3.2.0
apache-airflow-providers-google==19.5.0
apache-airflow-providers-hashicorp==4.4.3
apache-airflow-providers-http==5.6.4
apache-airflow-providers-mysql==6.4.2
apache-airflow-providers-postgres==6.5.3
apache-airflow-providers-sendgrid==4.2.1
apache-airflow-providers-slack==9.6.2
apache-airflow-providers-smtp==2.4.2
apache-airflow-providers-standard==1.11.0
apache-airflow-providers-trino==6.4.2
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Deployed via helm to k8s. Not heavily customized.
### Anything else?
In our upgraded database with lots of history, this happened every time.
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]