featzhang commented on code in PR #27561:
URL: https://github.com/apache/flink/pull/27561#discussion_r2792034667
##########
flink-models/flink-model-triton/src/main/java/org/apache/flink/model/triton/TritonInferenceModelFunction.java:
##########
@@ -217,26 +236,92 @@ public void onResponse(Call call, Response response)
} catch (Exception e) {
LOG.error("Failed to build Triton inference request", e);
- future.completeExceptionally(e);
+ handleFailureWithRetry(rowData, future, attemptNumber, e);
}
+ }
- return future;
+ /**
+ * Handles request failure with retry logic or default value fallback.
+ *
+ * @param rowData Input data for inference
+ * @param future The future to complete
+ * @param attemptNumber Current attempt number
+ * @param error The error that caused the failure
+ */
+ private void handleFailureWithRetry(
+ RowData rowData,
+ CompletableFuture<Collection<RowData>> future,
+ int attemptNumber,
+ Throwable error) {
+
+ if (attemptNumber < getMaxRetries()) {
+ // Calculate exponential backoff delay
+ long delayMs = getRetryBackoff().toMillis() * (1L <<
attemptNumber);
+
+ LOG.info(
+ "Retrying Triton inference request (attempt {}/{}) after
{} ms",
+ attemptNumber + 2,
+ getMaxRetries() + 1,
+ delayMs);
+
+ // Schedule retry with exponential backoff
+ CompletableFuture.delayedExecutor(delayMs,
java.util.concurrent.TimeUnit.MILLISECONDS)
+ .execute(() -> asyncPredictWithRetry(rowData, future,
attemptNumber + 1));
+ } else {
+ // All retries exhausted
+ if (getDefaultValue() != null) {
+ LOG.warn(
+ "All {} retry attempts failed. Returning configured
default value. Last error: {}",
+ getMaxRetries() + 1,
+ error.getMessage());
+
+ try {
+ Collection<RowData> defaultResult = parseDefaultValue();
Review Comment:
Good catch! I've improved the error handling to preserve the original
exceptions:
- Added the original error to logs with full stack trace
- When parsing default value fails, the exception now includes both the
original inference error and the parse error
- Used `addSuppressed()` to maintain the complete exception chain for better
debugging
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]