[ 
https://issues.apache.org/jira/browse/FLINK-32954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758835#comment-17758835
 ] 

Hangxiang Yu commented on FLINK-32954:
--------------------------------------

Thanks for reporting this.

Heap timer could improve the performance than rocksdb timer, but increase the 
pressure on the GC.

We have seen heap timer causing frequent GC many times in the production 
environment.

We have to heap dump to see the root cause.

It could help us to find the bottleneck of heap timer quickly.


So I think it makes sense if not introducing complex codes and extra cost.

Already assigned to you, please go ahead.

> Metrics expose number of heap timer
> -----------------------------------
>
>                 Key: FLINK-32954
>                 URL: https://issues.apache.org/jira/browse/FLINK-32954
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics, Runtime / State Backends
>            Reporter: Rui Xia
>            Priority: Major
>
> Expose the current number of heap timers to metric reporter. Internally, 
> expose the size of `InternalPriorityQueue` to `MetricGroup`. 
>  
> This metric can aid in debugging the memory consumption of heap timer. 
> Currently, we need to dump jvm heap to identify the memory consumption of 
> heap timer. With this metric, users can quickly get a basic knowledge on the 
> working condition of heap timers. 
>  
> The *numOfTimers* metric is only suitable for heap timer. Heap priority queue 
> (`AbstractHeapPriorityQueue`) has an off-the-shelf member to get size 
> (`AbstractHeapPriorityQueue#size`). Expose it to metric reporter is zero-cost.
>  
> The *numOfTimers* metric for RocksDB timer brings performance loss, and is 
> not supported right now. RocksDB (cached) priority queue does *not* has an 
> off-the-shelf member to get size. Intuitively, we can add an exclusive 
> counter for RocksDB priority queue. This counter affects the runtime 
> per-record code path. Thus, the *numOfTimers* metric for RocksDB timer is 
> currently not supported. I think there may exist some better/lightweight 
> metrics to let user learn the work condition of RocksDB timers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to