mxm commented on PR #762:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/762#issuecomment-1908759173

   > @mxm would you mind adding a few example calculations when memory would be 
scaled up or down based on the processing rate and heap usage metrics?
   
   An example:
   
   ```
   num_task_managers = 10
   tm_memory = 40 Gb
   observed_max_heap_size = 30 Gb
   observed_average_heap_usage = 15 Gb
   ```
   
   This will configure the heap memory to 15 Gb and reduce the total memory by 
`num_task_managers*15=150 Gb`, if the expected true processing rate does not 
change. If it changes, the memory is updated as follows:
   
   ```
   data_change_rate = sum(expected_processing_rate) / 
sum(current_processing_rate)
   total_heap = num_task_managers * observed_average_heap_usage * 
data_change_rate
   tm_memory = total_heap / num_task_managers_after_rescale
   ```
   
   The total heap size of the job is sized according to actual usage and scaled 
proportionally to the max expected records. This will solve memory issues for 
downscaled jobs which suffer from too few memory because the memory size is not 
scaled proportionally to the actual memory usage, but proportionally to the CPU 
usage.
   
   To use some example numbers:
   
   Example 1:
   ```
   num_task_managers_after_rescale = 5
   data_change_rate = 75 rec/s / 100 rec/s
   total_heap = 10 * 15Gb * 0.75 = 112.5 Gb
   tm_memory = 112.5 Gb / 5 = 22.5 Gb
   ```
   
   Example 2:
   ```
   num_task_managers_after_rescale = 5
   data_change_rate = 50 rec/s / 100 rec/s
   total_heap = 10 * 15Gb * 0.5 = 75 Gb
   tm_memory = 75 Gb / 5 = 15 Gb
   ```
   
   You can see that the heap memory per TM in Example 2 actually stayed the 
same, despite the data processing needs halving. That is actually not that 
different from the scaling we currently have and I can see how the 
data_change_rate may seem redundant. The main reason I integrated it was to 
supply headroom to operators who consume more records but are not necessarily 
adding more task managers.
   
   I'm open to simplifying and removing the `data_change_rate`, simply tuning 
the memory according to the used memory. From looking at metrics of actual 
deployments, I can observe that the sum(heap_used) / sum(num_records_processed) 
is a fairly constant value.
   
   > I haven't looked at the PR logic in detail but it's a bit hard for me to 
mentally visualise the expectations just by the description alone. Some more 
illustrations / examples would go a long way :)
   
   I'll create a doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to