1996fanrui commented on PR #920: URL: https://github.com/apache/flink-kubernetes-operator/pull/920#issuecomment-2500331341
Thanks @gyfora and @mxm for the quick comment and suggestion! > So if you have 10 vertices and they would be scaled down at different times you can have 10 restarts within the scale down window. Which does not feel right. Actually, we only rescale twice (instead of 10 times) if we have 10 vertices and they would be scaled down at different times. Assuming scale down interval is 1 hour: - vertex1 triggers scale down at 12:10, - vertex2 triggers scale down at 12:15, - vertex3 triggers scale down at 12:18, - vertex10 triggers scale down at 12:50, And then vertex1 will be scaled down at 13:10, and rest of them do not be changed. After 13: 10, if all of vertex2 to vertex10 need to scale down, all of them will trigger scale down at 13: 10, and will be scaled down at 14:10 > I think we should consider setting global scale down windows, for example one possible implementation would be that if the scale down interval is 1 hour and we have one vertex requesting the scale down first at 12:10 the second at 12:20, then we scale both down at 13:20 (instead of doing 2 scale downs). So basically coalescing the intervals to some extent to really max 1 scale down per hour There may be an unexpected case, and it’s easy to happen when scale down interval >= 24 hours. We assume scale down interval = 24 hours, and the peak hour are following: - vertex1 18:00 - vertex2 20:00 - vertex3 22:00 Trigger times: - vertex1 triggers scale down at 2024-11-20 19:00, - vertex2 triggers scale down at 2024-11-20 21:00, - vertex3 triggers scale down at 2024-11-20 23:00, So this strategy hope scale down is executed at 2024-11-21 23:00. If all of vertices need to scale down after 24 hours, it works well. But unexpected case is : vertex2 always wanna scale down, but vertex1 and vertex3 runs on the parallelism of peak time. The scale down trigger is canceled for vertex1 at 2024-11-21 18:00, and vertex1 re-triggered scale down at 2024-11-21 19:00. At this time, the trigger status are: - vertex1 triggers scale down at 2024-11-21 19:00, - vertex2 triggers scale down at 2024-11-20 21:00, - vertex3 triggers scale down at 2024-11-20 23:00, It hopes scale down is executed at 2024-11-22 19:00 But vertex3 will cancel and re-trigger again, the execution time will be postponed again. It will be looped everyday, and scale down of vertex2 never happen. Please correct me if I misunderstand anything, and I’m happy to hear more suggestions from you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
