purushah opened a new pull request, #852:
URL: https://github.com/apache/flink-agents/pull/852
## What is the purpose of the change
Adds a **drop-in chat model that routes each request to the best underlying
model**, then delegates to it. The router is a `CHAT_MODEL` resource, so an
agent points at it by name with **no change to the runtime, events, or agent
definition**.
This is the *in-chat* selector (which LLM serves a single `chat()` call). A
DataStream-level content-based agent router (branching records across agent
operators) is a separate, follow-up concern.
## Brief change log
- **`RoutingStrategy`** — pluggable selection SPI (`request -> candidate
name`). Selection is a *pure* concern; returning `null` means "abstain / no
opinion".
- **`ChatModelRouter`** — orchestrates select → (optional cache) → validate
→ delegate. A strategy that abstains (`null`) or names a non-candidate is a
**routing miss** and degrades to the configured `default` candidate (validated
at construction; defaults to the first candidate) rather than failing the
request.
- **`FallbackPolicy`** — optional: try remaining candidates on error.
- **`CachingStrategy`** — optional bounded-LRU memoization of the decision
per conversation, so an expensive strategy (e.g. an LLM judge) runs **once per
conversation**, not once per tool-call round. Abstentions (`null`) are never
cached.
- Built-in strategies:
- **`RuleBasedRoutingStrategy`** — deterministic keyword/regex rules +
default.
- **`LlmRoutingStrategy`** — a small "judge" model picks the candidate
from each candidate's name/description (RouteLLM-style). Distinguishes a
**transient judge failure** (abstain → retried next round, uncached) from an
**unparseable reply** (deterministic default). Parses by whole-token match (no
substring mis-routing, e.g. `gpt-4o-mini` won't match a `gpt-4` candidate).
- **Bring-your-own** strategies are first-class: implement `RoutingStrategy`
and reference it by fully-qualified class name; loaded via the thread context
classloader (cluster-safe). ML/learned routing is supported the same way.
- Adds `LlmRoutingAgentExample` and unit tests.
## Verifying this change
This change adds tests and can be verified as follows:
- Unit tests under `api/.../chat/model/routing/` covering rule selection,
judge parsing (whole-token match), stickiness across tool-call rounds,
fallback, caching (incl. abstain-not-cached), routing-miss degrade-to-default,
and bring-your-own loading. All pass; `spotless:check` clean (JDK 17).
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **no**
- The public API: **yes** — adds the
`org.apache.flink.agents.api.chat.model.routing` package (additive; no existing
API changed).
- The serializers: **no**
- The runtime per-record code paths: **no** (router is a `CHAT_MODEL`
resource resolved by name)
- Anything that affects deployment or recovery: **no** — preserves
exactly-once / keyed-state / checkpoint semantics (no new operator, no nested
invocation).
## Security note
An LLM/ML routing decision is a **hint, not an authority** — the user's
message is sent to the judge model, so a routing decision is susceptible to
prompt injection. Cost/privilege/safety controls must not be gated solely on
it. This is documented on `LlmRoutingStrategy`.
## Documentation
- New public package is documented via javadoc on each type. Built-in
strategies, the abstain/routing-miss contract, and the bring-your-own extension
point are described on the SPI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]