[PR] feat: enhance U-shape idle prediction for scale-down scenarios (druid)

via GitHub Fri, 05 Jun 2026 06:48:05 -0700


Fly-Style opened a new pull request, #19562:
URL: https://github.com/apache/druid/pull/19562


   **Description**
   
   The cost-based supervisor autoscaler wouldn't scale down a healthy, 
over-provisioned supervisor — one above the ideal idle ratio with low lag 
stayed pinned at its current task count.
   
   **Root cause.** The idle projection was linear:
   
   ```rawIdle = 1.0 - busyFraction / taskRatio;   // taskRatio = proposed / 
current```
   
   This assumes busy time is fully conserved when work moves onto fewer tasks, 
so a reasonable consolidation projects negative idle `(e.g. 1 − 0.6/0.5 
=−0.2)`. That clamps to 0 (the worst point of the U-shaped idle cost) and turns 
an overrun into phantom virtual lag — pinning the task count even at ~0 real 
lag. In reality, busy grows sublinearly (an observed 2× consolidation raised 
busy ~1.25×, not 2×).
   
   **Fix.** Redistribute busy sublinearly:
   ```
   projectedBusy = busyFraction * (currentTaskCount / proposedTaskCount) ^ 
IDLE_SUBLINEARITY_EXPONENT;  // 0.32
   rawIdle = 1.0 - projectedBusy;
   ```
   `IDLE_SUBLINEARITY_EXPONENT = 0.32 (≈ log₂(1.25))` is a tuned constant based 
on careful testing and theorecial math application.
   
   A healthy consolidation now lands near the ideal idle ratio instead of going 
negative, so the supervisor scales down; the exponent stays > 0, so extreme 
over-consolidation still diverges and is braked. 
   
   **Validation**
   
   <details>
   Optimal task count vs. observed poll-idle ratio, across realistic configs 
(rate = total cluster throughput, split per-task):
   
   <img width="1200" height="750" alt="cost_based_scaledown_medium_7Mpm" 
src="https://github.com/user-attachments/assets/45a1387a-aa00-40b8-97a9-63796425f618";
 />
   
   Old version stays pinned at 128 until idle ~0.55 while new version 
consolidates from ~0.32.
   
   <img width="1200" height="750" alt="cost_based_scaledown_large_30Mpm" 
src="https://github.com/user-attachments/assets/36732e97-e622-4f1f-8257-6680378ea8d3";
 />
   
   Safe under load: new version consolidates earlier on the high-idle side, but 
at low idle both still jump to max — lag-driven scale-up is unaffected.
   
   <img width="1800" height="675" alt="cost_based_v1_vs_v2_large_30Mpm_amp0 35" 
src="https://github.com/user-attachments/assets/df23085a-de92-4aea-8e3d-b2791d0842ff";
 />
   
   The existing version is flat (pinned at max by the phantom overrun); new 
version consolidates and holds more tasks as lag weight rises.
   
   </details>
   
   **Release note**
   
   Fixed an issue where the cost-based supervisor autoscaler would not scale 
down an over-provisioned supervisor running above its ideal idle ratio with low 
lag.
   
     - [x] self-reviewed.
     - [x] added comments explaining the "why".
     - [x] added/updated unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: enhance U-shape idle prediction for scale-down scenarios (druid)

Reply via email to