featzhang created FLINK-39059:
---------------------------------

             Summary: Add unified metrics support to AsyncPredictFunction and 
PredictFunction
                 Key: FLINK-39059
                 URL: https://issues.apache.org/jira/browse/FLINK-39059
             Project: Flink
          Issue Type: Sub-task
            Reporter: featzhang


h3. Subtask: Add Built-in Metrics for Model Inference Functions

*Description*
Introduce unified, built-in metrics support for model inference in Flink by 
enhancing both {{PredictFunction}} and {{{}AsyncPredictFunction{}}}. The goal 
is to provide consistent observability for inference workloads without 
requiring changes in individual model implementations.

*Scope*
 * Add common metrics instrumentation to the base inference function classes.
 * Ensure both synchronous and asynchronous inference paths are covered.
 * Automatically enable metrics for all existing and future model connectors 
(e.g., OpenAI, Triton).

*Metrics Included*
 * {{{}inference_requests{}}}: Total number of inference requests.
 * {{{}inference_requests_success{}}}: Number of successful inference requests.
 * {{{}inference_requests_failure{}}}: Number of failed inference requests.
 * {{{}inference_latency{}}}: Histogram of inference latency in milliseconds.
 * {{{}inference_rows_output{}}}: Total number of output rows produced by 
inference.

*Extensibility*
 * Provide a {{createLatencyHistogram()}} hook method.
 * Allow subclasses to customize latency histogram behavior (e.g., bucket 
configuration).

*Acceptance Criteria*
 * Metrics are registered automatically without modifying existing model 
implementations.
 * Metrics are exposed consistently for both {{PredictFunction}} and 
{{{}AsyncPredictFunction{}}}.
 * No regression in existing inference functionality.
 * Metrics names and semantics are aligned with Flink metrics conventions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to