[PR] feat(ai-rag): support multiple embedding providers, add Cohere rerank, and standardize chat interface [apisix]

via GitHub Mon, 26 Jan 2026 21:46:42 -0800


ChuanFF opened a new pull request, #12941:
URL: https://github.com/apache/apisix/pull/12941


   ## Description
   
   This PR enhances the `ai-rag` plugin with multi-provider support, optional 
reranking capability, and a standardized chat interface that works with native 
LLM request formats.
   
   ### Key Improvements
   
   #### 1. Multiple Embedding Provider Support
   
   * Support three embedding backends via unified schema:
     * `openai` – OpenAI embeddings API
     * `azure-openai` – Azure OpenAI Service
     * `openai-compatible` – Any OpenAI-compatible endpoint
   * Shared implementation via `openai-base.lua` with provider-specific 
configuration
   * Added `model` and `dimensions` options for flexibility
   
   #### 2. Optional Rerank Stage
   
   * Introduced optional reranking after vector search to improve document 
relevance
   * Initial implementation: **Cohere Rerank**
     * Configurable model and `top_n` parameter
     * Graceful fallback to original results on failure
   * Rerank is fully optional and only executed when configured
   
   #### 3. Standardized Chat Interface
   
   * **Removed dependency on custom `ai_rag` field** in request body
   * Plugin now works with standard LLM chat formats:
     ```json
     {
       "messages": [
         {"role": "user", "content": "What is APISIX?"}
       ]
     }
     ```
   * Added `rag_config.input_strategy` to control text extraction:
     * `last` (default): uses only the latest user message
     * `all`: concatenates all user messages with newlines
   
   #### 4. Improved Context Injection
   
   * Retrieved (and optionally reranked) documents are injected as contextual 
user messages
   * Context is inserted **before the latest user message**, ensuring relevance 
to the current query:
     ```text
     Context:
     [doc1]\n\n[doc2]\n\n...
     ```
   
   #### 5. Vector Search Enhancements (Azure AI Search)
   
   * Renamed provider to `azure-ai-search` (kebab-case consistency)
   * Extended configuration options:
     * `fields` – fields to search against
     * `select` – fields to return in results
     * `k` – number of nearest neighbors (default: 5)
     * `exhaustive` – whether to perform exhaustive search (default: true)
   * Returns parsed documents instead of raw response bodies
   
   #### 6. Schema Improvements
   
   * Simplified plugin schema using `oneOf` for provider selection
   * Added descriptive field documentation
   * Removed request-side `ai_rag` payload requirements
   
   ---
   
   ## Backward Compatibility
   
   ⚠️ **This PR introduces breaking changes:**
   
   | Area | Before | After |
   |------|--------|-------|
   | **Request format** | Required `ai_rag` field with nested configs | 
Standard `messages` array only |
   | **Azure OpenAI key** | `azure_openai` (snake_case) | `azure-openai` 
(kebab-case) |
   | **Context position** | Appended as final message | Inserted before latest 
user message |
   | **Vector search output** | Raw JSON string | Table of document strings |
   
   This redesign is intentional to support more flexible and production-ready 
RAG workflows.
   
   ---
   
   ## Example Configuration
   
   ```json
   {
     "embeddings_provider": {
       "openai": {
         "endpoint": "https://api.openai.com/v1/embeddings";,
         "api_key": "sk-xxx",
         "model": "text-embedding-3-large",
         "dimensions": 1536
       }
     },
     "vector_search_provider": {
       "azure-ai-search": {
         "endpoint": 
"https://xxx.search.windows.net/indexes/xxx/docs/search?api-version=2023-11-01";,
         "api_key": "xxx",
         "fields": "vector",
         "select": "content",
         "k": 5,
         "exhaustive": true
       }
     },
     "rerank_provider": {
       "cohere": {
         "endpoint": "https://api.cohere.ai/v2/rerank";,
         "api_key": "xxx",
         "model": "rerank-english-v3.0",
         "top_n": 3
       }
     },
     "rag_config": {
       "input_strategy": "last"
     }
   }
   ```
   
   ---
   
   ## Motivation
   
   The original `ai-rag` implementation had several limitations:
   
   1. **Vendor lock-in**: Only supported Azure OpenAI embeddings
   2. **Custom request format**: Required `ai_rag` field, making integration 
with standard LLM APIs cumbersome
   3. **No reranking**: Retrieved documents were used as-is without relevance 
scoring
   4. **Rigid context injection**: Context was appended after the user query, 
which could dilute the user query's importance
   
   This enhancement addresses these issues by:
   
   * Enabling **multiple embedding providers** with minimal code duplication
   * Supporting **realistic RAG pipelines** (retrieve → rerank → augment)
   * Simplifying integration with **standard LLM chat APIs**
   * Providing better control over **context injection behavior**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(ai-rag): support multiple embedding providers, add Cohere rerank, and standardize chat interface [apisix]

Reply via email to