shahar1 commented on PR #67688: URL: https://github.com/apache/airflow/pull/67688#issuecomment-4572276629
## Benchmark Results: Hybrid Optimization Impact This follow-up optimization provides dramatic improvements on the adversarial reverse-chain case that PR #67288 exposed. ### Reverse-Chain Speedup (Worst Case) | N | Sweep-only (ms) | Hybrid (ms) | Speedup | |---|---|---|---| | 100 | 0.35 | 0.09 | **3.70x** | | 500 | 7.60 | 0.47 | **16.26x** | | 1000 | 29.47 | 0.99 | **29.80x** | | 2000 | 119.10 | 2.14 | **55.75x** | ### How It Works The optimization uses a **hybrid dispatch strategy**: - **Forward-declared DAGs** (common case): Greedy multi-pass sweep, O(V + E) - **Reverse-declared DAGs** (adversarial): Pass-number traversal via Kahn's algorithm, O((V + E) log V) The dispatcher automatically detects when a DAG has many "back edges" (dependencies declared after dependents) and switches algorithms accordingly. ### Performance Progression - **Before PR #67288** (old Kahn's): ~894ms for reverse-chain N=2000 - **After PR #67288** (sweep-only): ~119ms - **With this follow-up** (hybrid): ~2.14ms ✅ Both approaches maintain the same emission order (level-by-legacy-pass), so the optimization is transparent to users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
