membphis commented on PR #13578: URL: https://github.com/apache/apisix/pull/13578#issuecomment-4786071487
I found two merge-blocking issues in the current `ai-cache` implementation: ### [P1] Cache key does not include the effective model or picked AI instance `ai-cache` computes the fingerprint from `ctx.var.request_llm_model or body.model`, but it does not include `ctx.picked_ai_instance_name`, provider, or the route / instance effective `options.model`: - `apisix/plugins/ai-cache/key.lua`: the fingerprint uses only protocol, requested model, normalized messages, and remaining body params. - `apisix/plugins/ai-cache.lua`: the lookup happens in `access`, before the upstream request is built. - `ai-proxy-multi` has already selected `ctx.picked_ai_instance` before lower-priority plugins run, so that selected instance is available at cache lookup time. This can return the wrong provider/model response on an `ai-proxy-multi` route. A request can warm the cache through instance A, then a later identical request can be routed to instance B but still hit and replay instance A's response because both requests share the same cache key. This should be fixed before merge by including the selected AI instance and/or effective model/provider in the cache key or scope, with a regression test covering `ai-proxy-multi` instances that use different models or providers. ### [P2] The plugin can cache ordinary JSON traffic when it is not behind `ai-proxy` The docs say `ai-cache` must be used with `ai-proxy` or `ai-proxy-multi`, but the implementation does not enforce or safely bypass that condition. `ai-cache.access` reads any JSON request body, computes a key, and marks the request as `MISS`; then `log` writes any 200 response to Redis. There is no `ctx.picked_ai_instance` guard like the existing AI moderation plugins use. If the plugin is accidentally attached at Route / Service / Consumer level without an AI proxy, ordinary JSON upstream responses can be cached and replayed. That is a surprising behavior and can leak stale or incorrect non-AI responses. Please add a guard before key computation, either bypassing by default or using the shared `ai-protocols.binding` `fail_mode` behavior, and add coverage for the no-`ai-proxy` case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
