Fly-Style opened a new pull request, #19562: URL: https://github.com/apache/druid/pull/19562
**Description** The cost-based supervisor autoscaler wouldn't scale down a healthy, over-provisioned supervisor — one above the ideal idle ratio with low lag stayed pinned at its current task count. **Root cause.** The idle projection was linear: ```rawIdle = 1.0 - busyFraction / taskRatio; // taskRatio = proposed / current``` This assumes busy time is fully conserved when work moves onto fewer tasks, so a reasonable consolidation projects negative idle `(e.g. 1 − 0.6/0.5 =−0.2)`. That clamps to 0 (the worst point of the U-shaped idle cost) and turns an overrun into phantom virtual lag — pinning the task count even at ~0 real lag. In reality, busy grows sublinearly (an observed 2× consolidation raised busy ~1.25×, not 2×). **Fix.** Redistribute busy sublinearly: ``` projectedBusy = busyFraction * (currentTaskCount / proposedTaskCount) ^ IDLE_SUBLINEARITY_EXPONENT; // 0.32 rawIdle = 1.0 - projectedBusy; ``` `IDLE_SUBLINEARITY_EXPONENT = 0.32 (≈ log₂(1.25))` is a tuned constant based on careful testing and theorecial math application. A healthy consolidation now lands near the ideal idle ratio instead of going negative, so the supervisor scales down; the exponent stays > 0, so extreme over-consolidation still diverges and is braked. **Validation** <details> Optimal task count vs. observed poll-idle ratio, across realistic configs (rate = total cluster throughput, split per-task): <img width="1200" height="750" alt="cost_based_scaledown_medium_7Mpm" src="https://github.com/user-attachments/assets/45a1387a-aa00-40b8-97a9-63796425f618" /> Old version stays pinned at 128 until idle ~0.55 while new version consolidates from ~0.32. <img width="1200" height="750" alt="cost_based_scaledown_large_30Mpm" src="https://github.com/user-attachments/assets/36732e97-e622-4f1f-8257-6680378ea8d3" /> Safe under load: new version consolidates earlier on the high-idle side, but at low idle both still jump to max — lag-driven scale-up is unaffected. <img width="1800" height="675" alt="cost_based_v1_vs_v2_large_30Mpm_amp0 35" src="https://github.com/user-attachments/assets/df23085a-de92-4aea-8e3d-b2791d0842ff" /> The existing version is flat (pinned at max by the phantom overrun); new version consolidates and holds more tasks as lag weight rises. </details> **Release note** Fixed an issue where the cost-based supervisor autoscaler would not scale down an over-provisioned supervisor running above its ideal idle ratio with low lag. - [x] self-reviewed. - [x] added comments explaining the "why". - [x] added/updated unit tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
