shahar1 opened a new pull request, #67288: URL: https://github.com/apache/airflow/pull/67288
## Summary Replaces the modified-Kahn body of `TaskGroup.topological_sort` (and the mirror in `SerializedTaskGroup`) with a projected sweep: - Each child's per-task `upstream_task_ids` is projected onto sibling-level integer indices **once** up front (`_project_child_deps`). - The sweep (`_sweep_projection`) then runs against the projection with a `bytearray`-backed emission flag — per-edge work happens once per sort instead of once per outer-loop pass. Emission order is identical to the previous implementation. The existing order-sensitive tests (`test_topological_sort1/2`, `test_topological_nested_groups`, `test_topological_group_dep`) cover the contract. ## What this PR does *not* do Keeps the legacy emission semantics; this PR is a constant-factor win across realistic shapes. It does **not** fix the algorithmic O(N²) blowup on heavily reverse-declared DAGs (the "build all tasks, then wire dependencies" factory pattern). That is deferred to a follow-up PR that adds a pass-numbering Kahn's traversal behind a back-edge threshold. ## Benchmark Min of three runs, N=2000 children, single-level `TaskGroup`, ms per call. Source: <https://gist.github.com/shahar1/9c61dc9f34f7e77cd29cfb9d67af7ceb> | Shape | main | PR | speedup | |---|---:|---:|---:| | chain (forward-declared) | 1.42 | 0.73 | **1.94x** | | diamond (root → many → sink) | 2.21 | 0.86 | **2.57x** | | independent (no deps) | 0.61 | 0.33 | **1.84x** | | layered (4-layer full bipartite) | 237.9 | 69.0 | **3.45x** | | rev-chain (adversarial reverse insertion) | 884.9 | 108.8 | 8.1x | | nested groups | ~noise | ~noise | — | The 8.1x on rev-chain comes from the projection precompute alone (one projection vs N projections); algorithmic complexity is unchanged and remains O(N²) on that shape until the follow-up PR. ## Test plan - [x] `task-sdk/tests/task_sdk/definitions/test_taskgroup.py` + `test_dag.py` — 127 passed - [x] `airflow-core/tests/unit/utils/test_task_group.py` — 19 passed - [x] `airflow-core/tests/unit/serialization/` (task-group / topological filter) — 4 passed - [x] `prek run mypy-task-sdk` — passed - [x] `prek run mypy-airflow-core` — passed - [x] `prek run --from-ref upstream/main --stage pre-commit` — passed --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes — Claude Code (Opus 4.7) Generated-by: Claude Code (Opus 4.7) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) --- Drafted-by: Claude Code (Opus 4.7) (no human review before posting) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
