kjprice opened a new issue, #13083:
URL: https://github.com/apache/apisix/issues/13083

   ## Description
   
   Currently, `ai-proxy-multi` selects AI instances entirely server-side via 
configured `priority`, `weight`, and `fallback_strategy`. The client has no way 
to influence which model or provider handles their request, or in what order 
fallbacks should be attempted.
   
   This proposal adds support for a `models` field in the request body, 
allowing clients to specify their preferred model ordering while the gateway 
retains full control over authentication, rate limiting, and provider 
configuration.
   
   ## Proposed API
   
   Clients include a `models` array in the chat completions request body. Each 
element can be:
   
   **Object form** (full control):
   ```json
   {
     "messages": [{"role": "user", "content": "Hello"}],
     "models": [
       {"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
       {"provider": "openai", "model": "gpt-4o"}
     ]
   }
   ```
   
   **String shorthand** (just model names, matched against configured 
instances):
   ```json
   {
     "messages": [{"role": "user", "content": "Hello"}],
     "models": ["claude-sonnet-4-20250514", "gpt-4o"]
   }
   ```
   
   **Mixed**:
   ```json
   {
     "messages": [{"role": "user", "content": "Hello"}],
     "models": [
       {"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
       "gpt-4o"
     ]
   }
   ```
   
   ## Behavior
   
   1. The `models` array defines the **client's preferred instance ordering** 
(first element = highest priority).
   2. Each entry is matched against the route's configured `ai-proxy-multi` 
instances by `model` name and optionally `provider`.
   3. String entries match by model name only. Object entries can match by 
`provider` + `model` for disambiguation (e.g., two providers serving the same 
model).
   4. Unmatched entries are ignored (the client cannot introduce 
providers/models not configured on the route).
   5. Configured instances not referenced in `models` are appended after the 
client's preferred list, in their original priority order.
   6. The existing `fallback_strategy` still applies — if the top-priority 
instance fails, the plugin falls back through the client-specified order, then 
server-configured instances.
   7. Auth, rate limiting, and all other instance config remain 
server-controlled.
   
   ## Configuration
   
   A new optional boolean on the plugin config to opt-in:
   
   ```yaml
   plugins:
     ai-proxy-multi:
       allow_client_model_preference: true   # default: false
       instances:
         - name: anthropic-primary
           provider: anthropic
           options:
             model: claude-sonnet-4-20250514
           # ...
         - name: openai-fallback
           provider: openai
           options:
             model: gpt-4o
           # ...
   ```
   
   When `allow_client_model_preference` is `false` (default), the `models` 
field is stripped from the request body and instance ordering is purely 
server-driven. This ensures backward compatibility.
   
   ## Use Case
   
   We're building an LLM proxy gateway where multiple teams consume AI services 
through a shared gateway. Different clients have different model preferences:
   - Team A prefers Claude with GPT-4o fallback
   - Team B prefers GPT-4o with Claude fallback
   - The gateway team manages auth keys, rate limits, and provider configs 
centrally
   
   Today this requires separate routes per team/preference. With client-driven 
model selection, a single route handles all teams while respecting their 
preferences.
   
   ## Implementation Notes
   
   - The matching logic would live in `ai-proxy-multi.lua`, executed before 
instance selection in the `access` phase.
   - The `models` field should be stripped from the request body before 
forwarding to the upstream provider (providers don't recognize it).
   - String matching should be case-sensitive to match provider model naming 
conventions.
   - When using `chash` balancer, client preference should take precedence over 
hash-based selection.
   
   I'm happy to submit a PR for this feature if the approach looks good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to