Re: [PR] [FLINK-34152] Tune heap memory of autoscaled jobs [flink-kubernetes-operator]

via GitHub Wed, 24 Jan 2024 06:58:08 -0800


mxm commented on PR #762:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/762#issuecomment-1908302386


   > Hi @mxm , thanks for your contribution!
   > 
   > I have some questions for this tuning strategy, and I left some comments. 
Please take a look in your free time, thanks~
   > 
   > > ### 1. Establish a heap memory baseline
   > > We observe the average heap memory usage (`heap_usage`) at task managers.
   > 
   > Is it possible the average heap memory usage is low, but a little tms are 
high. (It happens when data skew.)
   
   Certainly possible, e.g. for hot key scenarios. We could take the max of the 
average usage to alleviate that concern.
   
   > 
   > > ### 2. Calculate memory usage per record
   > > The memory requirements per record can be estimated by calculating this 
ratio:
   > > ```
   > > heap_memory_per_rec = sum(heap_usage) / sum(processing_rate)
   > > ```
   > >     
   > > This ratio is surprisingly constant based off looking at empirical data.
   > 
   > I'm curious about this assumption. In general, flink process data one by 
one, it only process one record at the same time. So the heap memory doesn't 
cache more data even if the processing rate get high.
   
   From what I observed, heap memory correlated closely with the number of 
records processed. There is some amount of memory overhead with every record 
processed. Maybe the data I looked at was biased. I'm open to changing this 
formula.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-34152] Tune heap memory of autoscaled jobs [flink-kubernetes-operator]

Reply via email to