shahar1 opened a new pull request, #67702:
URL: https://github.com/apache/airflow/pull/67702
Combine the two sequential iterations over `dag.task_dict` in
`DagSerialization.serialize_dag()` into a single pass.
## Root cause
`serialize_dag()` iterated `dag.task_dict` twice in sequence:
```python
# pass 1 — full task serialization
serialized_dag["tasks"] = [cls.serialize(task) for _, task in
dag.task_dict.items()]
# pass 2 — dependency detection (separate loop over the same dict)
dag_deps = [
dep
for task in dag.task_dict.values()
for dep in OperatorSerialization.detect_dependencies(task)
]
```
## Fix
Combine into one loop:
```python
serialized_tasks = []
dag_deps = []
for task in dag.task_dict.values():
serialized_tasks.append(cls.serialize(task))
dag_deps.extend(OperatorSerialization.detect_dependencies(task))
serialized_dag["tasks"] = serialized_tasks
```
This eliminates the redundant `dict.values()` traversal and produces a
single,
linear, readable flow through the task list.
## What this does — and doesn't — fix
The primary win is **code clarity**: instead of two separate comprehensions
with different loop variables, there is one loop that accumulates both
outputs.
The redundant dict traversal itself is not the dominant cost inside the
second
pass. The more expensive work there — `ensure_serialized_asset()`
encode+decode
roundtrips for every outlet on every task — is unchanged by the loop merge.
That is a separate follow-on optimization opportunity.
## Benchmark results
Benchmark script:
https://gist.github.com/shahar1/069a9a8fb20896ac6b603b3462a77faa
Run with: `uv run --project airflow-core python dev/bench_dep_single_pass.py`
### End-to-end `DagSerialization.to_dict()` (warm venv, parallel baseline
run)
| Scenario | Before (min) | After (min) | Note |
|---|---|---|---|
| 10 tasks × 1 outlet | ~2.4 ms | ~2.4 ms | within noise |
| 100 tasks × 1 outlet | ~16.7 ms | ~17.9 ms | within noise |
| 100 tasks × 5 outlets | ~21.7 ms | ~26.8 ms | within noise |
| 500 tasks × 5 outlets | ~104.7 ms | ~106.1 ms | within noise |
| 1000 tasks × 5 outlets | ~208.5 ms | ~219.2 ms | within noise |
| 200 tasks × 20 outlets | ~89.3 ms | ~80.9 ms | −9 % |
Wall-clock improvement from the loop merge alone is within measurement noise
for most scenarios — see the note above on where the real cost lives.
### Isolated dep-pass cost (separate loop vs inline)
| Scenario | Separate pass | Inline |
|---|---|---|
| 100 tasks × 5 outlets | 2.8 ms | 3.1 ms |
| 500 tasks × 5 outlets | 14.9 ms | 27.1 ms* |
| 1000 tasks × 5 outlets | 55.1 ms | 48.3 ms |
| 200 tasks × 20 outlets | 33.3 ms | 32.1 ms |
\* The 500-task outlier is measurement noise from a concurrent system load
spike.
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Code (Sonnet 4.6)
Generated-by: Claude Code (Sonnet 4.6) following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
---
Drafted-by: Claude Code (Sonnet 4.6); reviewed by @shahar1 before posting
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]