[
https://issues.apache.org/jira/browse/FLINK-39153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liu updated FLINK-39153:
------------------------
Description:
Currently, flink-python only supports basic user-defined metrics via the Beam
framework. There is no built-in visibility into key performance aspects such as
bundle processing time, serialization/deserialization overhead, gRPC
communication latency, UDF execution time, state access performance, or Python
process health (CPU, memory, GC).
This JIRA proposes adding comprehensive system-level performance metrics
covering the entire data flow path in flink-python, enabling users to diagnose
performance bottlenecks within minutes instead of hours.
A initial design is as following:
[https://docs.google.com/document/d/1ovs29Pup8bRpV4qY8wI611nYV55zf771tJXQksPwehw/edit?usp=sharing]
was:
Currently, flink-python only supports basic user-defined metrics via the Beam
framework. There is no built-in visibility into key performance aspects such as
bundle processing time, serialization/deserialization overhead, gRPC
communication latency, UDF execution time, state access performance, or Python
process health (CPU, memory, GC).
This JIRA proposes adding comprehensive system-level performance metrics
covering the entire data flow path in flink-python, enabling users to diagnose
performance bottlenecks within minutes instead of hours.
A FLIP will be draft later.
> Add comprehensive performance metrics for PyFlink
> -------------------------------------------------
>
> Key: FLINK-39153
> URL: https://issues.apache.org/jira/browse/FLINK-39153
> Project: Flink
> Issue Type: Improvement
> Components: API / Python
> Reporter: Liu
> Priority: Major
>
> Currently, flink-python only supports basic user-defined metrics via the Beam
> framework. There is no built-in visibility into key performance aspects such
> as bundle processing time, serialization/deserialization overhead, gRPC
> communication latency, UDF execution time, state access performance, or
> Python process health (CPU, memory, GC).
> This JIRA proposes adding comprehensive system-level performance metrics
> covering the entire data flow path in flink-python, enabling users to
> diagnose performance bottlenecks within minutes instead of hours.
> A initial design is as following:
> [https://docs.google.com/document/d/1ovs29Pup8bRpV4qY8wI611nYV55zf771tJXQksPwehw/edit?usp=sharing]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)