janiussyafiq opened a new pull request, #13578: URL: https://github.com/apache/apisix/pull/13578
### Description Adds a new `ai-cache` plugin that caches LLM responses and replays them for subsequent requests that resolve to the same prompt, cutting upstream token cost and latency for repetitive workloads (FAQ bots, document Q&A, translation). This PR implements the **exact (L1)** cache layer: - **Cache key** — a SHA-256 fingerprint of the request as received: client protocol, requested model, normalized messages, and the remaining response-determining body parameters (`temperature`, `top_p`, `max_tokens`, `tools`, …). Provider-agnostic via `ai-protocols`, so it works for every chat protocol `ai-proxy` supports (OpenAI Chat, Anthropic Messages, Bedrock Converse, OpenAI Responses). - **Storage** — Redis (single-node); connection fields are sourced from `apisix.utils.redis-schema` via the `policy` + `if/then` convention used by `limit-count` / `limit-req` / `limit-conn`. - **Scope** — shared cache by default; opt-in per-consumer / per-variable isolation (`cache_key.include_consumer` / `include_vars`). - **Behavior** — write-on-2xx only (non-streaming); `cache_bypass` opt-out (proxy-cache convention); `max_cache_body_size` cap; `X-AI-Cache-Status` / `X-AI-Cache-Age` response headers; fails open (proxies as a normal miss) when Redis is unreachable. - Runs below `ai-proxy` (priority `1035`) and depends on `ai-proxy` / `ai-proxy-multi`. Semantic cache, streaming support, and observability are planned as follow-up PRs. User-facing documentation will be added in a later PR once the series is further along. #### Which issue(s) this PR fixes: Related to #13290 ### Checklist - [x] I have explained the need for this PR and the problem it solves - [x] I have explained the changes or the new features added to this PR - [x] I have added tests corresponding to this change - [ ] I have updated the documentation to reflect this change - [x] I have verified that this change is backward compatible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
