Re: [PR] Do not kill a task if offsets are inconsistent but publish from another group is pending (druid)

via GitHub Sun, 08 Mar 2026 21:52:08 -0700


kfaraz commented on PR #19091:
URL: https://github.com/apache/druid/pull/19091#issuecomment-4021088820


   > Huh, I thought we already changed that in 
https://github.com/apache/druid/pull/18745. Could you please also update the 
comment above the code block that currently says: // if autoscaler is enabled, 
then taskCount will be ignored.
   
   Yeah, apparently, we missed a case in `SeekableStreamSupervisorIOConfig`.
   
   > I also believe that it would be better to move away from retries 
completely. The supervisor should be able to do more to help tasks sequence 
their publishes.
   
   Yes, agreed.
   
   > I suppose if processing rate is zero but lag is high, we shouldn't scale 
at all (either up or down). The combination of those two metrics suggests 
something is broken that scaling isn't going to fix.
   
   Yeah, that sounds reasonable. Let me add that check and maybe raise an alert 
for the same.
   
   >> CostBasedAutoScaler scales down tasks if processing rate is zero, even if 
lag is high. (I guess this shouldn't happen in practice since if lag is high, 
tasks must be busy with some processing. Except when there is a bug and task 
threads are stuck somewhere.)
   
   > What'd you have it do in this situation?
   
   Oh, this was an old comment. I later rephrased it. The actual problem was 
2-fold: (1) auto-scaler had a bug where it would do spurious scale downs even 
when cost was some for all task counts, and (2) auto-scaler did not take any 
action if processing rate was zero. Instead, it should have skipped scaling 
when processing rate was zero _but_ lag was high.
   
   I will update the check accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Do not kill a task if offsets are inconsistent but publish from another group is pending (druid)

Reply via email to