featzhang created FLINK-39225:
---------------------------------

             Summary: Add retry with default value fallback for triton 
inference failures
                 Key: FLINK-39225
                 URL: https://issues.apache.org/jira/browse/FLINK-39225
             Project: Flink
          Issue Type: Sub-task
          Components: Table SQL / Runtime
            Reporter: featzhang


Adds retry mechanism with default value fallback for Triton model inference 
failures, enabling robust error handling and downstream filtering.
h2. Brief change log
h3. 1. New Configuration Options (TritonOptions.java)
 * {{{}max-retries{}}}: Maximum retry attempts (default: 0)
 * {{{}retry-backoff{}}}: Initial backoff duration with exponential strategy 
(default: 100ms)
 * {{{}default-value{}}}: Fallback value when all retries fail

h3. 2. Retry Logic (TritonInferenceModelFunction.java)
 * Implements exponential backoff retry strategy
 * Retries on network errors and 5xx server errors (503, 504)
 * Fails immediately on 4xx client errors (configuration issues)
 * Detailed logging for each retry attempt

h3. 3. Default Value Fallback
 * Returns configured default value after exhausting all retries
 * Supports all output types: STRING, numeric, ARRAY
 * Enables downstream view-based routing for success/failure cases
 * Backward compatible: throws exceptions if no default value configured

h3. 4. AbstractTritonModelFunction.java
 * Added fields and getters for retry configuration

h2. Use Cases

{*}Scenario{*}: After N consecutive failures, return a default value that 
downstream can use to route records to success/failure paths.

{*}Example Configuration{*}:
CREATE MODEL my_triton_model
WITH (  'provider' = 'triton',  'endpoint' = 'http://triton:8000/v2/models',  
'model-name' = 'my-model',  'max-retries' = '3',              -- Retry up to 3 
times'retry-backoff' = '100ms',        -- 100ms, 200ms, 400ms 
backoff'default-value' = 'FAILED'        -- Return 'FAILED' on all failures);
 
{*}Downstream Processing{*}:
-- Route based on prediction resultINSERT INTO success_tableSELECT * FROM 
predictions WHERE result != 'FAILED';INSERT INTO failure_tableSELECT * FROM 
predictions WHERE result = 'FAILED';
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to