[
https://issues.apache.org/jira/browse/FLINK-39160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun Lakshman updated FLINK-39160:
----------------------------------
Summary: [Runtime][Rpc][Metrics] Expose RPC response frame size and
oversized-response rejection metrics (was: [runtime][rpc][metrics] Expose RPC
response frame size and oversized-response rejection metrics)
> [Runtime][Rpc][Metrics] Expose RPC response frame size and oversized-response
> rejection metrics
> -----------------------------------------------------------------------------------------------
>
> Key: FLINK-39160
> URL: https://issues.apache.org/jira/browse/FLINK-39160
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / RPC
> Affects Versions: 2.2.0
> Reporter: Arun Lakshman
> Priority: Minor
> Labels: metrics, rpc
>
> Flink currently lacks metrics for RPC-level observability for serialized
> response frame sizes and oversized-response rejections. When responses exceed
> pekko.framesize, they are rejected, but we cannot easily see the
> response-size trend. This makes it difficult to diagnose RPC failures, tune
> frame-size settings, and detect payload-size regressions in production
> Today, oversized RPC responses are primarily visible only through error logs,
> with no dedicated metric to track response sizes or rejection frequency over
> time. This makes diagnosis reactive and noisy, since operators must grep logs
> instead of using dashboards/alerts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)