featzhang created FLINK-39059:
---------------------------------
Summary: Add unified metrics support to AsyncPredictFunction and
PredictFunction
Key: FLINK-39059
URL: https://issues.apache.org/jira/browse/FLINK-39059
Project: Flink
Issue Type: Sub-task
Reporter: featzhang
h3. Subtask: Add Built-in Metrics for Model Inference Functions
*Description*
Introduce unified, built-in metrics support for model inference in Flink by
enhancing both {{PredictFunction}} and {{{}AsyncPredictFunction{}}}. The goal
is to provide consistent observability for inference workloads without
requiring changes in individual model implementations.
*Scope*
* Add common metrics instrumentation to the base inference function classes.
* Ensure both synchronous and asynchronous inference paths are covered.
* Automatically enable metrics for all existing and future model connectors
(e.g., OpenAI, Triton).
*Metrics Included*
* {{{}inference_requests{}}}: Total number of inference requests.
* {{{}inference_requests_success{}}}: Number of successful inference requests.
* {{{}inference_requests_failure{}}}: Number of failed inference requests.
* {{{}inference_latency{}}}: Histogram of inference latency in milliseconds.
* {{{}inference_rows_output{}}}: Total number of output rows produced by
inference.
*Extensibility*
* Provide a {{createLatencyHistogram()}} hook method.
* Allow subclasses to customize latency histogram behavior (e.g., bucket
configuration).
*Acceptance Criteria*
* Metrics are registered automatically without modifying existing model
implementations.
* Metrics are exposed consistently for both {{PredictFunction}} and
{{{}AsyncPredictFunction{}}}.
* No regression in existing inference functionality.
* Metrics names and semantics are aligned with Flink metrics conventions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)