fdc91 opened a new pull request, #26204:
URL: https://github.com/apache/flink/pull/26204

   ## What is the purpose of the change
   
   This PR addresses a performance regression in the MetricStore that impacts 
clients fetching metrics, such as the autoscaler or web UI. The issue occurs 
when the /metrics endpoint becomes unresponsive due to delays in removing 
transient metrics for completed subtasks. This cleanup process is executed 
synchronously during metric retrieval, leading to significant 
slowdowns—particularly when the JM has multiple jobs or subtasks in a terminal 
state. These delays prevent timely metric fetching, disrupting 
latency-sensitive systems like the autoscaler. The root cause, identified via 
flamegraph analysis, is the inefficient synchronous execution of the cleanup 
routine introduced with FLINK-31650.
   
   ## Brief change log
   
     - Optimized the metrics cleanup process in `MetricStore` by caching the 
names of transient metrics when first stored
     - Improved metric removal efficiency by executing the cleanup routine only 
once
   
   ## Verifying this change
   
   Relying on UT added in https://github.com/apache/flink/pull/23988 
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to