[ 
https://issues.apache.org/jira/browse/FLINK-39153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu updated FLINK-39153:
------------------------
    Description: 
Currently, flink-python only supports basic user-defined metrics via the Beam 
framework. There is no built-in visibility into key performance aspects such as 
bundle processing time, serialization/deserialization overhead, gRPC 
communication latency, UDF execution time, state access performance, or Python 
process health (CPU, memory, GC).

This JIRA proposes adding comprehensive system-level performance metrics 
covering the entire data flow path in flink-python, enabling users to diagnose 
performance bottlenecks within minutes instead of hours.

A initial design is as following:
[https://docs.google.com/document/d/1ovs29Pup8bRpV4qY8wI611nYV55zf771tJXQksPwehw/edit?usp=sharing]

  was:
Currently, flink-python only supports basic user-defined metrics via the Beam 
framework. There is no built-in visibility into key performance aspects such as 
bundle processing time, serialization/deserialization overhead, gRPC 
communication latency, UDF execution time, state access performance, or Python 
process health (CPU, memory, GC).

This JIRA proposes adding comprehensive system-level performance metrics 
covering the entire data flow path in flink-python, enabling users to diagnose 
performance bottlenecks within minutes instead of hours.

A FLIP will be draft later.


> Add comprehensive performance metrics for PyFlink
> -------------------------------------------------
>
>                 Key: FLINK-39153
>                 URL: https://issues.apache.org/jira/browse/FLINK-39153
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Python
>            Reporter: Liu
>            Priority: Major
>
> Currently, flink-python only supports basic user-defined metrics via the Beam 
> framework. There is no built-in visibility into key performance aspects such 
> as bundle processing time, serialization/deserialization overhead, gRPC 
> communication latency, UDF execution time, state access performance, or 
> Python process health (CPU, memory, GC).
> This JIRA proposes adding comprehensive system-level performance metrics 
> covering the entire data flow path in flink-python, enabling users to 
> diagnose performance bottlenecks within minutes instead of hours.
> A initial design is as following:
> [https://docs.google.com/document/d/1ovs29Pup8bRpV4qY8wI611nYV55zf771tJXQksPwehw/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to