shahar1 opened a new pull request, #67288:
URL: https://github.com/apache/airflow/pull/67288

   ## Summary
   
   Replaces the modified-Kahn body of `TaskGroup.topological_sort` (and the 
mirror in `SerializedTaskGroup`) with a projected sweep:
   
   - Each child's per-task `upstream_task_ids` is projected onto sibling-level 
integer indices **once** up front (`_project_child_deps`).
   - The sweep (`_sweep_projection`) then runs against the projection with a 
`bytearray`-backed emission flag — per-edge work happens once per sort instead 
of once per outer-loop pass.
   
   Emission order is identical to the previous implementation. The existing 
order-sensitive tests (`test_topological_sort1/2`, 
`test_topological_nested_groups`, `test_topological_group_dep`) cover the 
contract.
   
   ## What this PR does *not* do
   
   Keeps the legacy emission semantics; this PR is a constant-factor win across 
realistic shapes. It does **not** fix the algorithmic O(N²) blowup on heavily 
reverse-declared DAGs (the "build all tasks, then wire dependencies" factory 
pattern). That is deferred to a follow-up PR that adds a pass-numbering Kahn's 
traversal behind a back-edge threshold.
   
   ## Benchmark
   
   Min of three runs, N=2000 children, single-level `TaskGroup`, ms per call. 
Source: <https://gist.github.com/shahar1/9c61dc9f34f7e77cd29cfb9d67af7ceb>
   
   | Shape | main | PR | speedup |
   |---|---:|---:|---:|
   | chain (forward-declared) | 1.42 | 0.73 | **1.94x** |
   | diamond (root → many → sink) | 2.21 | 0.86 | **2.57x** |
   | independent (no deps) | 0.61 | 0.33 | **1.84x** |
   | layered (4-layer full bipartite) | 237.9 | 69.0 | **3.45x** |
   | rev-chain (adversarial reverse insertion) | 884.9 | 108.8 | 8.1x |
   | nested groups | ~noise | ~noise | — |
   
   The 8.1x on rev-chain comes from the projection precompute alone (one 
projection vs N projections); algorithmic complexity is unchanged and remains 
O(N²) on that shape until the follow-up PR.
   
   ## Test plan
   
   - [x] `task-sdk/tests/task_sdk/definitions/test_taskgroup.py` + 
`test_dag.py` — 127 passed
   - [x] `airflow-core/tests/unit/utils/test_task_group.py` — 19 passed
   - [x] `airflow-core/tests/unit/serialization/` (task-group / topological 
filter) — 4 passed
   - [x] `prek run mypy-task-sdk` — passed
   - [x] `prek run mypy-airflow-core` — passed
   - [x] `prek run --from-ref upstream/main --stage pre-commit` — passed
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Opus 4.7)
   
   Generated-by: Claude Code (Opus 4.7) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   
   ---
   Drafted-by: Claude Code (Opus 4.7) (no human review before posting)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to