shahar1 opened a new pull request, #67702:
URL: https://github.com/apache/airflow/pull/67702

   Combine the two sequential iterations over `dag.task_dict` in
   `DagSerialization.serialize_dag()` into a single pass.
   
   ## Root cause
   
   `serialize_dag()` iterated `dag.task_dict` twice in sequence:
   
   ```python
   # pass 1 — full task serialization
   serialized_dag["tasks"] = [cls.serialize(task) for _, task in 
dag.task_dict.items()]
   
   # pass 2 — dependency detection (separate loop over the same dict)
   dag_deps = [
       dep
       for task in dag.task_dict.values()
       for dep in OperatorSerialization.detect_dependencies(task)
   ]
   ```
   
   ## Fix
   
   Combine into one loop:
   
   ```python
   serialized_tasks = []
   dag_deps = []
   for task in dag.task_dict.values():
       serialized_tasks.append(cls.serialize(task))
       dag_deps.extend(OperatorSerialization.detect_dependencies(task))
   serialized_dag["tasks"] = serialized_tasks
   ```
   
   This eliminates the redundant `dict.values()` traversal and produces a 
single,
   linear, readable flow through the task list.
   
   ## What this does — and doesn't — fix
   
   The primary win is **code clarity**: instead of two separate comprehensions
   with different loop variables, there is one loop that accumulates both 
outputs.
   
   The redundant dict traversal itself is not the dominant cost inside the 
second
   pass. The more expensive work there — `ensure_serialized_asset()` 
encode+decode
   roundtrips for every outlet on every task — is unchanged by the loop merge.
   That is a separate follow-on optimization opportunity.
   
   ## Benchmark results
   
   Benchmark script: 
https://gist.github.com/shahar1/069a9a8fb20896ac6b603b3462a77faa  
   Run with: `uv run --project airflow-core python dev/bench_dep_single_pass.py`
   
   ### End-to-end `DagSerialization.to_dict()` (warm venv, parallel baseline 
run)
   
   | Scenario | Before (min) | After (min) | Note |
   |---|---|---|---|
   | 10 tasks × 1 outlet | ~2.4 ms | ~2.4 ms | within noise |
   | 100 tasks × 1 outlet | ~16.7 ms | ~17.9 ms | within noise |
   | 100 tasks × 5 outlets | ~21.7 ms | ~26.8 ms | within noise |
   | 500 tasks × 5 outlets | ~104.7 ms | ~106.1 ms | within noise |
   | 1000 tasks × 5 outlets | ~208.5 ms | ~219.2 ms | within noise |
   | 200 tasks × 20 outlets | ~89.3 ms | ~80.9 ms | −9 % |
   
   Wall-clock improvement from the loop merge alone is within measurement noise
   for most scenarios — see the note above on where the real cost lives.
   
   ### Isolated dep-pass cost (separate loop vs inline)
   
   | Scenario | Separate pass | Inline |
   |---|---|---|
   | 100 tasks × 5 outlets | 2.8 ms | 3.1 ms |
   | 500 tasks × 5 outlets | 14.9 ms | 27.1 ms* |
   | 1000 tasks × 5 outlets | 55.1 ms | 48.3 ms |
   | 200 tasks × 20 outlets | 33.3 ms | 32.1 ms |
   
   \* The 500-task outlier is measurement noise from a concurrent system load 
spike.
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Sonnet 4.6)
   
   Generated-by: Claude Code (Sonnet 4.6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   
   ---
   
   Drafted-by: Claude Code (Sonnet 4.6); reviewed by @shahar1 before posting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to