janiussyafiq commented on PR #13308: URL: https://github.com/apache/apisix/pull/13308#issuecomment-4443532662
Closing this PR to address @membphis's review properly. Splitting the work into a phased series so the cache-key correctness conversation can be reviewed independently of semantic caching, embedding providers, and multi-protocol support: - **Phase 1**: exact (L1) cache only, **for `openai-chat` protocol only**, with a conservative cache key that accounts for everything affecting LLM output — effective model post-override, protocol, picked upstream instance, full structured `messages` (not concatenated text), whitelisted generation parameters, stream flag, and operator-scoped consumer/vars. All `openai-chat`-compatible providers (Azure OpenAI, DeepSeek, OpenRouter, Together, Fireworks, Ollama, etc.) work transparently. Sliced into 5 small reviewable PRs. - **Phase 2**: additional client protocols (`openai-responses`, `anthropic-messages`, `bedrock-converse`) — one PR each. - **Phase 3+**: semantic (L2) embedding-based cache as its own design track. The full plan including PR breakdown, cache-key invariants, and per-PR test gates is captured in a PRD on my fork: https://github.com/janiussyafiq/apisix/issues/13 The first PR in the series — a small, behavior-preserving `ai-proxy` refactor that extracts the instance-override application logic into helpers (so the cache key can reflect operator-side overrides correctly) — will follow shortly. Thanks for the careful review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
