kfaraz commented on PR #19091: URL: https://github.com/apache/druid/pull/19091#issuecomment-4021088820
> Huh, I thought we already changed that in https://github.com/apache/druid/pull/18745. Could you please also update the comment above the code block that currently says: // if autoscaler is enabled, then taskCount will be ignored. Yeah, apparently, we missed a case in `SeekableStreamSupervisorIOConfig`. > I also believe that it would be better to move away from retries completely. The supervisor should be able to do more to help tasks sequence their publishes. Yes, agreed. > I suppose if processing rate is zero but lag is high, we shouldn't scale at all (either up or down). The combination of those two metrics suggests something is broken that scaling isn't going to fix. Yeah, that sounds reasonable. Let me add that check and maybe raise an alert for the same. >> CostBasedAutoScaler scales down tasks if processing rate is zero, even if lag is high. (I guess this shouldn't happen in practice since if lag is high, tasks must be busy with some processing. Except when there is a bug and task threads are stuck somewhere.) > What'd you have it do in this situation? Oh, this was an old comment. I later rephrased it. The actual problem was 2-fold: (1) auto-scaler had a bug where it would do spurious scale downs even when cost was some for all task counts, and (2) auto-scaler did not take any action if processing rate was zero. Instead, it should have skipped scaling when processing rate was zero _but_ lag was high. I will update the check accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
