becketqin commented on pull request #13920:
URL: https://github.com/apache/flink/pull/13920#issuecomment-723556790


   @StephanEwen Thanks for the comment. I have updated the patch. Would you 
help take a look? 
   
   I have also left some thoughts regarding the performance impact. Please let 
me know if you still have concerns. 
   
   Just want to add to the performance discussion. The first version of 
implementation I had has a `DynamicMetricSampler` class which does the 
following:
   1. Take a target metric reporting overhead in percentage, for example, 
0.01%.  
   2. Measure the absolute time it takes to report a metric, e.g. 1000 ns. 
   3. Based on the overhead and throughput, it calculates the metric sampling 
interval. In the above case, if the throughput is 1000 records per second, each 
record takes 1 ms (1,000,000 ns) to process. If the overhead is 0.01%, the 
budget for metric reporting is 100 ns per record. Given that each metric 
reporting takes 1000 ns, the sampling interval should be every 10 records.
   4. The metric sampling interval is adjusted periodically to reflect the 
latest throughput.
   
   The above logic allows a quantifiable bounded performance impact on the 
throughput. But I removed because In most cases, periodical reporting is good 
enough, e.g. reporting metrics every second. So we can avoid some complexity. 
If you think this approach helps, we can also bring that in.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to