[PR] Add unified metrics support to AsyncPredictFunction and PredictFunction [flink]

via GitHub Mon, 09 Feb 2026 18:23:48 -0800


featzhang opened a new pull request, #27560:
URL: https://github.com/apache/flink/pull/27560


   This PR adds built-in metrics for monitoring model inference performance in 
both AsyncPredictFunction and PredictFunction.
   
   ## Changes
   - Add built-in metrics for model inference monitoring
   - Metrics include: requests, success/failure counts, latency, output rows  
   - Subclasses can override createLatencyHistogram() to provide custom 
histogram
   - All model implementations (OpenAI, Triton, etc.) automatically get metrics
   - Zero code changes needed in subclasses
   
   ## Metrics Provided
   The following metrics are automatically tracked:
   - `inference_requests`: Total number of inference requests
   - `inference_requests_success`: Number of successful inference requests
   - `inference_requests_failure`: Number of failed inference requests
   - `inference_latency`: Histogram of inference latency in milliseconds 
(optional, can be overridden)
   - `inference_rows_output`: Total number of output rows from inference
   
   ## Benefits
   1. **Zero code changes for subclasses**: All existing model implementations 
automatically get metrics
   2. **Consistent monitoring**: All model functions use the same metrics schema
   3. **Extensible**: Subclasses can override `createLatencyHistogram()` for 
custom implementations
   4. **Performance insight**: Provides visibility into model inference 
performance and reliability


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Add unified metrics support to AsyncPredictFunction and PredictFunction [flink]

Reply via email to