ingaleniranjan365 opened a new pull request, #68316:
URL: https://github.com/apache/airflow/pull/68316
## Performance issue
**File**: `airflow-core/src/airflow/jobs/scheduler_job_runner.py:2234`
On large Airflow deployments (hundreds of DAGs), `_create_dag_runs()` emits
a `WHERE (dag_id, logical_date) IN (('dag1', ts1), ..., ('dagN', tsN))` clause
with O(N) bind parameters on **every scheduler heartbeat** (default: every 5
seconds). PostgreSQL's query planner re-plans this from scratch each time,
causing perceived scheduler "hanging" — the root cause reported in #61453 (26
👍).
The same pattern was already fixed for a neighbouring call site in PR #62114
(merged 2026-03-13). This PR fixes the remaining location.
## Fix
Replace the single unbounded `tuple_.in_()` with a chunked loop of ≤1000-row
batches so the SQL parameter count is bounded regardless of fleet size.
Semantics are identical — the results are combined into the same
`existing_dagruns` dict before use.
## Evidence
Before: 1 query with O(N) bind parameters (N = number of scheduled DAGs) —
re-planned by Postgres on every 5s heartbeat.
After: ⌈N/1000⌉ queries each with ≤2000 bind parameters — stable query plan,
no planner thrashing.
## Validation
- Test harness: `pytest airflow-core/tests/unit/jobs/test_scheduler_job.py
-k test_create_dag_runs`
- Tests pass after fix: ✅ (7/7 passed, SQLite in-memory)
- Fix scope: domain-free, independent, 1 file / +12 -10 lines
Fixes part of: #61453
🌀 Magic applied with [Wibey VS Code
Extension](https://wibey.walmart.com/code) 🪄
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]