mxm commented on PR #762: URL: https://github.com/apache/flink-kubernetes-operator/pull/762#issuecomment-1908759173
> @mxm would you mind adding a few example calculations when memory would be scaled up or down based on the processing rate and heap usage metrics? An example: ``` num_task_managers = 10 tm_memory = 40 Gb observed_max_heap_size = 30 Gb observed_average_heap_usage = 15 Gb ``` This will configure the heap memory to 15 Gb and reduce the total memory by `num_task_managers*15=150 Gb`, if the expected true processing rate does not change. If it changes, the memory is updated as follows: ``` data_change_rate = sum(expected_processing_rate) / sum(current_processing_rate) total_heap = num_task_managers * observed_average_heap_usage * data_change_rate tm_memory = total_heap / num_task_managers_after_rescale ``` The total heap size of the job is sized according to actual usage and scaled proportionally to the max expected records. This will solve memory issues for downscaled jobs which suffer from too few memory because the memory size is not scaled proportionally to the actual memory usage, but proportionally to the CPU usage. To use some example numbers: Example 1: ``` num_task_managers_after_rescale = 5 data_change_rate = 75 rec/s / 100 rec/s total_heap = 10 * 15Gb * 0.75 = 112.5 Gb tm_memory = 112.5 Gb / 5 = 22.5 Gb ``` Example 2: ``` num_task_managers_after_rescale = 5 data_change_rate = 50 rec/s / 100 rec/s total_heap = 10 * 15Gb * 0.5 = 75 Gb tm_memory = 75 Gb / 5 = 15 Gb ``` You can see that the heap memory per TM in Example 2 actually stayed the same, despite the data processing needs halving. That is actually not that different from the scaling we currently have and I can see how the data_change_rate may seem redundant. The main reason I integrated it was to supply headroom to operators who consume more records but are not necessarily adding more task managers. I'm open to simplifying and removing the `data_change_rate`, simply tuning the memory according to the used memory. From looking at metrics of actual deployments, I can observe that the sum(heap_used) / sum(num_records_processed) is a fairly constant value. > I haven't looked at the PR logic in detail but it's a bit hard for me to mentally visualise the expectations just by the description alone. Some more illustrations / examples would go a long way :) I'll create a doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
