mxm commented on PR #799: URL: https://github.com/apache/flink-kubernetes-operator/pull/799#issuecomment-2012067487
> > Autoscaling wouldn't have a chance to realize its SLOs. > > You are right. Autoscaler supports scaling parallelism and memory for now. As I understand, the downtime cannot be guaranteed even if users only use scaling parallelism. For example, flink jobs don't use the Adaptive Scheduler and the input rate is always changed, then flink jobs will be scaled frequently. I agree that there are edge cases where the autoscaler cannot fulfill its service objectives. However, that doesn't mean we need to give up on them entirely. With restarts due to autotuning at any point in time, the autoscaling algorithm is inherently broken because downtime is never factored into the autoscaling decision. You mentioned the adaptive scheduler. Frankly, the use of the adaptive scheduler with autoscaling isn't fully developed. I would discourage users from using it with autoscaling at its current state. > Fortunately, scaling parallelism consider the restart time than scaling memory, and then increase some parallelisms. +1 > > > For this feature to be mergable, it will either have to be disabled by default (opt-in via config) > > IIUC, `job.autoscaler.memory.tuning.enabled` is disabled by default. It means the memory tuning is turned off by default even if this PR is merged, right? Autoscaling is also disabled by default. I think we want to make sure autoscaling and autotuning work together collaboratively. > > > or be integrated with autoscaling, i.e. figure out a way to balance tuning / autoscaling decisions and feed back tuning decisions to the autoscaling algorithm to scale up whenever we redeploy for memory changes to avoid falling behind and preventing autoscaling to scale up after downtime due to memory reconfigurations. > > The restartTime has been considered during `computeScalingSummary`, but we may ignore it due to the new parallelism is `WithinUtilizationTarget`. Do you mean we force adjust the parallelism to the new parallelism when scaling memory happens even if the new parallelism `WithinUtilizationTarget`? True, the rescale time has been considered for the downscale / upscale processing capacity, but the current processing capacity doesn't factor in downtime. Unplanned restarts would reduce the processing capacity. If we know we are going to restart, the autoscaling algorithm should factor this in, e.g. by reducing the calculated processing capacity accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
