rawwar opened a new issue, #53401: URL: https://github.com/apache/airflow/issues/53401
### Description When the TI table is huge(~ 10M records), , `_end_spans_of_externally_ended_ops` method([Link](https://github.com/apache/airflow/blob/912ccace3532e3c05a46f6a1cf9457c0de41db6b/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L1085C9-L1085C43)) takes very long to finish(in our case, about 75 seconds). This particular sqlalchemy query was taking too long: ``` tis_should_end: list[TaskInstance] = session.scalars( select(TaskInstance).where(TaskInstance.span_status == SpanStatus.SHOULD_END) ).all() ``` After adding the following index, performance improved significantly ``` create index idx_span_status on task_instance (id, span_status); ``` ### Use case/motivation Improve scheduler performance when TI table has lot of records ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
