Arunodoy18 opened a new pull request, #62108: URL: https://github.com/apache/airflow/pull/62108
### Problem The TaskInstance listing API query can suffer from performance degradation when executed on large datasets, particularly in MySQL environments. From investigation and community discussion, the main contributing factors are: - Multiple nested joins to the same tables increasing query complexity - Default sorting on `TaskInstance.start_date`, which is not indexed by default - UUID stored as VARCHAR in MySQL leading to slower comparisons compared to native UUID support in PostgreSQL This can cause slow response times in the TaskInstance API endpoint and UI views that depend on it. --- ### Root Cause The SQLAlchemy query used by the TaskInstance API endpoint performs redundant or nested joins to related tables. This increases query planning complexity and execution cost, especially for large TaskInstance datasets. Additionally: - Sorting by non-indexed columns forces full scans in some database engines. - MySQL UUID storage as VARCHAR amplifies comparison overhead. --- ### Solution This PR focuses on optimizing query construction while preserving API behaviour and response schema. Changes include: - Reducing duplicate or nested joins where not required - Using more targeted relationship loading strategies - Preserving ordering and pagination behaviour - Ensuring compatibility with both MySQL and PostgreSQL No schema changes or index changes are introduced in this PR. --- ### Performance Considerations The goal is to reduce query complexity and execution cost without altering database schema or API contract. The changes aim to: - Reduce join overhead - Maintain stable query plans across supported databases - Preserve default UI sorting behaviour --- ### Tests Added / updated tests to ensure: - API response schema remains unchanged - Ordering behaviour remains correct - Pagination behaviour remains correct - Query execution remains valid across supported database backends All existing tests pass locally. --- ### Backward Compatibility No breaking changes introduced. No database schema changes. No new dependencies added. --- closes: #62027 related: #61212 --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes (ChatGPT for reasoning and design validation, GitHub Copilot for implementation suggestions, test scaffolding review, and lint compliance assistance) Generated-by: ChatGPT and GitHub Copilot following Apache Airflow Gen-AI contribution guidelines -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
