[
https://issues.apache.org/jira/browse/FLINK-39160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061217#comment-18061217
]
Arun Lakshman commented on FLINK-39160:
---------------------------------------
pull request to add metrics for `pekko.framesize` metrics :
https://github.com/apache/flink/pull/27677
> [Runtime][Rpc][Metrics] Expose RPC response frame size and oversized-response
> rejection metrics
> -----------------------------------------------------------------------------------------------
>
> Key: FLINK-39160
> URL: https://issues.apache.org/jira/browse/FLINK-39160
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / RPC
> Affects Versions: 2.2.0
> Reporter: Arun Lakshman
> Priority: Minor
> Labels: metrics, rpc
>
> Flink currently lacks metrics for RPC-level observability for serialized
> response frame sizes and oversized-response rejections. When responses exceed
> pekko.framesize, they are rejected, but we cannot easily see the
> response-size trend. This makes it difficult to diagnose RPC failures, tune
> frame-size settings, and detect payload-size regressions in production
> Today, oversized RPC responses are primarily visible only through error logs,
> with no dedicated metric to track response sizes or rejection frequency over
> time. This makes diagnosis reactive and noisy, since operators must grep logs
> instead of using dashboards/alerts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)