mxm commented on PR #762: URL: https://github.com/apache/flink-kubernetes-operator/pull/762#issuecomment-1908302386
> Hi @mxm , thanks for your contribution! > > I have some questions for this tuning strategy, and I left some comments. Please take a look in your free time, thanks~ > > > ### 1. Establish a heap memory baseline > > We observe the average heap memory usage (`heap_usage`) at task managers. > > Is it possible the average heap memory usage is low, but a little tms are high. (It happens when data skew.) Certainly possible, e.g. for hot key scenarios. We could take the max of the average usage to alleviate that concern. > > > ### 2. Calculate memory usage per record > > The memory requirements per record can be estimated by calculating this ratio: > > ``` > > heap_memory_per_rec = sum(heap_usage) / sum(processing_rate) > > ``` > > > > This ratio is surprisingly constant based off looking at empirical data. > > I'm curious about this assumption. In general, flink process data one by one, it only process one record at the same time. So the heap memory doesn't cache more data even if the processing rate get high. From what I observed, heap memory correlated closely with the number of records processed. There is some amount of memory overhead with every record processed. Maybe the data I looked at was biased. I'm open to changing this formula. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
