featzhang opened a new pull request, #27560: URL: https://github.com/apache/flink/pull/27560
This PR adds built-in metrics for monitoring model inference performance in both AsyncPredictFunction and PredictFunction. ## Changes - Add built-in metrics for model inference monitoring - Metrics include: requests, success/failure counts, latency, output rows - Subclasses can override createLatencyHistogram() to provide custom histogram - All model implementations (OpenAI, Triton, etc.) automatically get metrics - Zero code changes needed in subclasses ## Metrics Provided The following metrics are automatically tracked: - `inference_requests`: Total number of inference requests - `inference_requests_success`: Number of successful inference requests - `inference_requests_failure`: Number of failed inference requests - `inference_latency`: Histogram of inference latency in milliseconds (optional, can be overridden) - `inference_rows_output`: Total number of output rows from inference ## Benefits 1. **Zero code changes for subclasses**: All existing model implementations automatically get metrics 2. **Consistent monitoring**: All model functions use the same metrics schema 3. **Extensible**: Subclasses can override `createLatencyHistogram()` for custom implementations 4. **Performance insight**: Provides visibility into model inference performance and reliability -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
