ChuanFF opened a new pull request, #12941:
URL: https://github.com/apache/apisix/pull/12941
## Description
This PR enhances the `ai-rag` plugin with multi-provider support, optional
reranking capability, and a standardized chat interface that works with native
LLM request formats.
### Key Improvements
#### 1. Multiple Embedding Provider Support
* Support three embedding backends via unified schema:
* `openai` – OpenAI embeddings API
* `azure-openai` – Azure OpenAI Service
* `openai-compatible` – Any OpenAI-compatible endpoint
* Shared implementation via `openai-base.lua` with provider-specific
configuration
* Added `model` and `dimensions` options for flexibility
#### 2. Optional Rerank Stage
* Introduced optional reranking after vector search to improve document
relevance
* Initial implementation: **Cohere Rerank**
* Configurable model and `top_n` parameter
* Graceful fallback to original results on failure
* Rerank is fully optional and only executed when configured
#### 3. Standardized Chat Interface
* **Removed dependency on custom `ai_rag` field** in request body
* Plugin now works with standard LLM chat formats:
```json
{
"messages": [
{"role": "user", "content": "What is APISIX?"}
]
}
```
* Added `rag_config.input_strategy` to control text extraction:
* `last` (default): uses only the latest user message
* `all`: concatenates all user messages with newlines
#### 4. Improved Context Injection
* Retrieved (and optionally reranked) documents are injected as contextual
user messages
* Context is inserted **before the latest user message**, ensuring relevance
to the current query:
```text
Context:
[doc1]\n\n[doc2]\n\n...
```
#### 5. Vector Search Enhancements (Azure AI Search)
* Renamed provider to `azure-ai-search` (kebab-case consistency)
* Extended configuration options:
* `fields` – fields to search against
* `select` – fields to return in results
* `k` – number of nearest neighbors (default: 5)
* `exhaustive` – whether to perform exhaustive search (default: true)
* Returns parsed documents instead of raw response bodies
#### 6. Schema Improvements
* Simplified plugin schema using `oneOf` for provider selection
* Added descriptive field documentation
* Removed request-side `ai_rag` payload requirements
---
## Backward Compatibility
⚠️ **This PR introduces breaking changes:**
| Area | Before | After |
|------|--------|-------|
| **Request format** | Required `ai_rag` field with nested configs |
Standard `messages` array only |
| **Azure OpenAI key** | `azure_openai` (snake_case) | `azure-openai`
(kebab-case) |
| **Context position** | Appended as final message | Inserted before latest
user message |
| **Vector search output** | Raw JSON string | Table of document strings |
This redesign is intentional to support more flexible and production-ready
RAG workflows.
---
## Example Configuration
```json
{
"embeddings_provider": {
"openai": {
"endpoint": "https://api.openai.com/v1/embeddings",
"api_key": "sk-xxx",
"model": "text-embedding-3-large",
"dimensions": 1536
}
},
"vector_search_provider": {
"azure-ai-search": {
"endpoint":
"https://xxx.search.windows.net/indexes/xxx/docs/search?api-version=2023-11-01",
"api_key": "xxx",
"fields": "vector",
"select": "content",
"k": 5,
"exhaustive": true
}
},
"rerank_provider": {
"cohere": {
"endpoint": "https://api.cohere.ai/v2/rerank",
"api_key": "xxx",
"model": "rerank-english-v3.0",
"top_n": 3
}
},
"rag_config": {
"input_strategy": "last"
}
}
```
---
## Motivation
The original `ai-rag` implementation had several limitations:
1. **Vendor lock-in**: Only supported Azure OpenAI embeddings
2. **Custom request format**: Required `ai_rag` field, making integration
with standard LLM APIs cumbersome
3. **No reranking**: Retrieved documents were used as-is without relevance
scoring
4. **Rigid context injection**: Context was appended after the user query,
which could dilute the user query's importance
This enhancement addresses these issues by:
* Enabling **multiple embedding providers** with minimal code duplication
* Supporting **realistic RAG pipelines** (retrieve → rerank → augment)
* Simplifying integration with **standard LLM chat APIs**
* Providing better control over **context injection behavior**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]