featzhang opened a new pull request, #27560:
URL: https://github.com/apache/flink/pull/27560

   This PR adds built-in metrics for monitoring model inference performance in 
both AsyncPredictFunction and PredictFunction.
   
   ## Changes
   - Add built-in metrics for model inference monitoring
   - Metrics include: requests, success/failure counts, latency, output rows  
   - Subclasses can override createLatencyHistogram() to provide custom 
histogram
   - All model implementations (OpenAI, Triton, etc.) automatically get metrics
   - Zero code changes needed in subclasses
   
   ## Metrics Provided
   The following metrics are automatically tracked:
   - `inference_requests`: Total number of inference requests
   - `inference_requests_success`: Number of successful inference requests
   - `inference_requests_failure`: Number of failed inference requests
   - `inference_latency`: Histogram of inference latency in milliseconds 
(optional, can be overridden)
   - `inference_rows_output`: Total number of output rows from inference
   
   ## Benefits
   1. **Zero code changes for subclasses**: All existing model implementations 
automatically get metrics
   2. **Consistent monitoring**: All model functions use the same metrics schema
   3. **Extensible**: Subclasses can override `createLatencyHistogram()` for 
custom implementations
   4. **Performance insight**: Provides visibility into model inference 
performance and reliability


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to