Arunodoy18 opened a new pull request, #62108:
URL: https://github.com/apache/airflow/pull/62108

   ### Problem
   
   The TaskInstance listing API query can suffer from performance degradation 
when executed on large datasets, particularly in MySQL environments.
   
   From investigation and community discussion, the main contributing factors 
are:
   
   - Multiple nested joins to the same tables increasing query complexity
   - Default sorting on `TaskInstance.start_date`, which is not indexed by 
default
   - UUID stored as VARCHAR in MySQL leading to slower comparisons compared to 
native UUID support in PostgreSQL
   
   This can cause slow response times in the TaskInstance API endpoint and UI 
views that depend on it.
   
   ---
   
   ### Root Cause
   
   The SQLAlchemy query used by the TaskInstance API endpoint performs 
redundant or nested joins to related tables.  
   This increases query planning complexity and execution cost, especially for 
large TaskInstance datasets.
   
   Additionally:
   - Sorting by non-indexed columns forces full scans in some database engines.
   - MySQL UUID storage as VARCHAR amplifies comparison overhead.
   
   ---
   
   ### Solution
   
   This PR focuses on optimizing query construction while preserving API 
behaviour and response schema.
   
   Changes include:
   
   - Reducing duplicate or nested joins where not required
   - Using more targeted relationship loading strategies
   - Preserving ordering and pagination behaviour
   - Ensuring compatibility with both MySQL and PostgreSQL
   
   No schema changes or index changes are introduced in this PR.
   
   ---
   
   ### Performance Considerations
   
   The goal is to reduce query complexity and execution cost without altering 
database schema or API contract.
   
   The changes aim to:
   - Reduce join overhead
   - Maintain stable query plans across supported databases
   - Preserve default UI sorting behaviour
   
   ---
   
   ### Tests
   
   Added / updated tests to ensure:
   
   - API response schema remains unchanged
   - Ordering behaviour remains correct
   - Pagination behaviour remains correct
   - Query execution remains valid across supported database backends
   
   All existing tests pass locally.
   
   ---
   
   ### Backward Compatibility
   
   No breaking changes introduced.
   No database schema changes.
   No new dependencies added.
   
   ---
   
   closes: #62027
   related: #61212
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (ChatGPT for reasoning and design validation, GitHub Copilot for 
implementation suggestions, test scaffolding review, and lint compliance 
assistance)
   
   Generated-by: ChatGPT and GitHub Copilot following Apache Airflow Gen-AI 
contribution guidelines
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to