(apisix-website) branch master updated: blog: add introducing-apisix-ai-gateway (#1888)

wenming Tue, 08 Apr 2025 22:55:36 -0700

This is an automated email from the ASF dual-hosted git repository.

wenming pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/apisix-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 5fb702cc3ff blog: add introducing-apisix-ai-gateway (#1888)
5fb702cc3ff is described below

commit 5fb702cc3ff6288a44ada5916c041ed541314fdf
Author: Yilia Lin <114121331+yilial...@users.noreply.github.com>
AuthorDate: Wed Apr 9 13:54:59 2025 +0800

    blog: add introducing-apisix-ai-gateway (#1888)
---
 .../2025/04/08/introducing-apisix-ai-gateway.md    | 304 ++++++++++++++++++++
 .../2025/04/08/introducing-apisix-ai-gateway.md    | 305 +++++++++++++++++++++
 2 files changed, 609 insertions(+)

diff --git a/blog/en/blog/2025/04/08/introducing-apisix-ai-gateway.md 
b/blog/en/blog/2025/04/08/introducing-apisix-ai-gateway.md
new file mode 100644
index 00000000000..e4ff6e76dd2
--- /dev/null
+++ b/blog/en/blog/2025/04/08/introducing-apisix-ai-gateway.md
@@ -0,0 +1,304 @@
+---
+title: "Introducing APISIX AI Gateway"
+authors:
+  - name: Yilia Lin
+    title: Technical Writer
+    url: https://github.com/Yilialinn
+    image_url: https://github.com/Yilialinn.png
+keywords:
+  - AI Gateway
+  - APISIX AI Gateway
+  - API Gateway
+  - AI Plugins
+  - API Management
+  - AI-Driven Applications
+description: In Apache APISIX version 3.12.0, we have further enhanced its AI 
support capabilities as a modern API gateway.
+tags: [Ecosystem]
+image: 
https://static.api7.ai/uploads/2025/03/07/Qs4WrU0I_apisix-ai-gateway.webp
+---
+
+In Apache APISIX version 3.12.0, we have further enhanced its AI support 
capabilities as a modern API gateway. Through a rich plugin ecosystem and 
flexible architectural design, we provide developers with a complete AI gateway 
product.
+
+This article delves into APISIX's innovative practices in the AI gateway 
domain from the following perspectives.
+
+## Core Functions of the AI Gateway
+
+APISIX's plugin ecosystem offers out-of-the-box capabilities for AI scenarios. 
Below are the core plugins and their functions:
+
+### Proxy and Request Management
+
+#### 1. ai-proxy
+
+The [`ai-proxy`](https://apisix.apache.org/docs/apisix/plugins/ai-proxy/) 
plugin simplifies access to large language models (LLMs) and embedding models 
by transforming plugin configurations into the designated request format. It 
supports integration with OpenAI, DeepSeek, and other OpenAI-compatible 
services.
+
+Additionally, the plugin supports logging LLM request information in the 
access log, such as token usage, model, time to first response, and more.
+
+#### 2. ai-proxy-multi
+
+The 
[`ai-proxy-multi`](https://apisix.apache.org/docs/apisix/plugins/ai-proxy-multi/)
 plugin simplifies access to large language models (LLMs) and embedding models 
by transforming plugin configurations into the request format required by 
OpenAI, DeepSeek, and other OpenAI-compatible services. It extends the 
capabilities of `ai-proxy` with load balancing, retries, fallbacks, and health 
checks.
+
+Additionally, the plugin supports logging LLM request information in the 
access log, such as token usage, model, time to first response, and more.
+
+**Example: Load Balancing**:
+
+The following example demonstrates how to configure two models for load 
balancing, forwarding 80% of the traffic to one instance and 20% to another.
+
+For demonstration and easier differentiation, you will configure one OpenAI 
instance and one DeepSeek instance as the upstream LLM services. Create a route 
and update with your LLM providers, models, API keys, and endpoints if 
applicable, setting the weight of the `openai-instance` to `8` and the weight 
of the `deepseek-instance` to `2`.
+
+```bash
+curl "http://127.0.0.1:9180/apisix/admin/routes"; -X PUT \
+  -H "X-API-KEY: ${ADMIN_API_KEY}" \
+  -d '{
+    "id": "ai-proxy-multi-route",
+    "uri": "/anything",
+    "methods": ["POST"],
+    "plugins": {
+      "ai-proxy-multi": {
+        "instances": [
+          {
+            "name": "openai-instance",
+            "provider": "openai",
+            "weight": 8,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "gpt-4"
+            }
+          },
+          {
+            "name": "deepseek-instance",
+            "provider": "deepseek",
+            "weight": 2,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "deepseek-chat"
+            }
+          }
+        ]
+      }
+    }
+  }'
+```
+
+#### 3. ai-request-rewrite
+
+The 
[`ai-request-rewrite`](https://apisix.apache.org/docs/apisix/plugins/ai-request-rewrite)
 plugin processes client requests by forwarding them to large language model 
(LLM) services for transformation before relaying them to upstream services. 
This enables AI-driven redaction, enrichment, and reformatting of requests. The 
plugin supports integration with OpenAI, DeepSeek, and other OpenAI-compatible 
APIs.
+
+**Example: Redacting Sensitive Information**:
+
+The following example demonstrates how you can use the `ai-request-rewrite` 
plugin to redact sensitive information before the request reaches the upstream 
service.
+
+Create a route and configure the `ai-request-rewrite` plugin as follows, 
specifying the provider as `openai`, attaching the OpenAI API key in the 
`Authorization` header, specifying the model name, and indicating the 
information to redact before the request reaches the upstream service.
+
+```shell
+curl "http://127.0.0.1:9180/apisix/admin/routes"; -X PUT \
+  -H "X-API-KEY: ${ADMIN_API_KEY}" \
+  -d '{
+    "id": "ai-request-rewrite-route",
+    "uri": "/anything",
+    "methods": ["POST"],
+    "plugins": {
+      "ai-request-rewrite": {
+        "provider": "openai",
+        "auth": {
+          "header": {
+            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
+          }
+        },
+        "options":{
+          "model": "gpt-4"
+        },
+        "prompt": "Given a JSON request body, identify and mask any sensitive 
information such as credit card numbers, social security numbers, and personal 
identification numbers (e.g., passport or driver's license numbers). Replace 
detected sensitive values with a masked format (e.g., \"*** **** **** 1234\") 
for credit card numbers. Ensure the JSON structure remains unchanged."
+      }
+    },
+    "upstream": {
+      "type": "roundrobin",
+      "nodes": {
+        "httpbin.org:80": 1
+      }
+    }
+  }'
+```
+
+Send a POST request to the route with some personally identifiable information:
+
+```shell
+curl "http://127.0.0.1:9080/anything"; -X POST \
+  -H "Content-Type: application/json" \
+  -d '{
+    "content": "John said his debit card number is 4111 1111 1111 1111 and SIN 
is 123-45-6789."
+  }'
+```
+
+Example response:
+
+```json
+{
+  "args": {},
+  # highlight-next-line
+  "data": "{\n    \"content\": \"John said his debit card number is **** **** 
**** 1111 and SIN is ***-**-****.\"\n  }"
+  ...,
+  "json": {
+    "messages": [
+      {
+        "content": "Client information from customer service calls",
+        "role": "system"
+      }, 
+      {
+        # highlight-next-line
+        "content": "John said his debit card number is **** **** **** 1111 and 
SIN is ***-**-****."
+        "role": "user"
+      }
+    ], 
+    "model": "openai"
+  }, 
+  "method": "POST", 
+  "origin": "192.168.97.1, 103.97.2.170", 
+  "url": "http://127.0.0.1/anything";
+}
+```
+
+### Traffic Control
+
+#### 4. ai-rate-limiting
+
+The 
[`ai-rate-limiting`](https://apisix.apache.org/docs/apisix/plugins/ai-rate-limiting/)
 plugin enforces token-based rate limiting for requests sent to large language 
model (LLM) services. It helps manage API usage by controlling the number of 
tokens consumed within a specified time frame, ensuring fair resource 
allocation and preventing excessive load on the service. It is often used in 
conjunction with the ai-proxy-multi plugin.
+
+**Example: Rate Limiting a Single Instance**:
+
+The following example demonstrates how to use ai-proxy-multi to configure two 
models for load balancing, forwarding 80% of the traffic to one instance and 
20% to another. Additionally, use `ai-rate-limiting` to configure token-based 
rate limiting on the instance that receives 80% of the traffic, so that when 
the configured quota is fully consumed, the additional traffic will be 
forwarded to the other instance.
+
+Create a route and update with your LLM providers, models, API keys, and 
endpoints as needed:
+
+```bash
+curl "http://127.0.0.1:9180/apisix/admin/routes"; -X PUT \
+  -H "X-API-KEY: ${ADMIN_API_KEY}" \
+  -d '{
+    "id": "ai-rate-limiting-route",
+    "uri": "/anything",
+    "methods": ["POST"],
+    "plugins": {
+      "ai-rate-limiting": {
+        "instances": [
+          {
+            "name": "deepseek-instance-1",
+            "provider": "deepseek",
+            "weight": 8,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "deepseek-chat"
+            }
+          },
+          {
+            "name": "deepseek-instance-2",
+            "provider": "deepseek",
+            "weight": 2,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "deepseek-chat"
+            }
+          }
+        ]
+      },
+      "ai-rate-limiting": {
+        "instances": [
+          {
+            "name": "deepseek-instance-1",
+            "limit_strategy": "total_tokens",
+            "limit": 100,
+            "time_window": 30
+          }
+        ]
+      }
+    }
+  }'
+```
+
+Send a POST request to the route with a system prompt and a sample user 
question in the request body:
+
+```bash
+curl "http://127.0.0.1:9080/anything"; -X POST \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      { "role": "system", "content": "You are a mathematician" },
+      { "role": "user", "content": "What is 1+1?" }
+    ]
+  }'
+```
+
+Example response:
+
+```json
+{
+  ...
+  "model": "deepseek-chat",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "1 + 1 equals 2. This is a fundamental arithmetic operation 
where adding one unit to another results in a total of two units."
+      },
+      "logprobs": null,
+      "finish_reason": "stop"
+    }
+  ],
+  ...
+}
+```
+
+If the `deepseek-instance-1` instance's rate limiting quota of 100 tokens has 
been consumed within a 30-second window, additional requests will be forwarded 
to the `deepseek-instance-2` instance, which is not rate limited.
+
+### Prompt Processing
+
+#### 5. ai-prompt-decorator
+
+The 
[`ai-prompt-decorator`](https://apisix.apache.org/docs/apisix/plugins/ai-prompt-decorator/)
 plugin sets the context for content generation by adding pre-designed prompts 
before and after user input. This practice helps the model operate according to 
the intended guidelines during interactions.
+
+#### 6. ai-prompt-template
+
+The 
[`ai-prompt-template`](https://apisix.apache.org/docs/apisix/plugins/ai-prompt-template/)
 plugin supports pre-configured prompt templates that only accept user input in 
specified template variables, operating in a "fill-in-the-blank" manner.
+
+#### 7. ai-prompt-guard
+
+The 
[`ai-prompt-guard`](https://apisix.apache.org/docs/apisix/plugins/ai-prompt-guard/)
 plugin protects your large language model (LLM) endpoints by inspecting and 
validating incoming prompt messages. It checks the request content against 
user-defined allow and deny patterns, ensuring only approved input is forwarded 
to the upstream LLM. Depending on its configuration, the plugin can check 
either the latest message or the entire conversation history and can be set to 
inspect prompts from [...]
+
+### Content Moderation
+
+#### 8. ai-aws-content-moderation
+
+The 
[`ai-aws-content-moderation`](https://apisix.apache.org/docs/apisix/plugins/ai-aws-content-moderation/)
 plugin integrates with AWS Comprehend to check for toxic content in the 
request body when proxying to large language models (LLMs), such as profanity, 
hate speech, insults, harassment, and violence, and rejects requests when the 
evaluation result exceeds the configured threshold.
+
+### Data Enhancement
+
+#### 9. ai-rag
+
+The [`ai-rag`](https://apisix.apache.org/docs/apisix/plugins/ai-rag/) plugin 
provides retrieval-augmented generation (RAG) capabilities for large language 
models (LLMs). It efficiently retrieves relevant documents or information from 
external data sources to enhance LLM responses, improving the accuracy and 
context relevance of the generated output. The plugin supports using Azure 
OpenAI and Azure AI Search services to generate embeddings and perform vector 
searches.
+
+## Observability
+
+APISIX has enhanced observability for AI applications, enabling real-time 
monitoring of key metrics such as time to first generated token (TTFT), token 
usage, and error rates. These capabilities help teams optimize costs, promptly 
identify performance issues, and ensure transparency through detailed logs and 
auditing mechanisms. Additionally, APISIX tracks token usage through access 
logs and observability components, effectively preventing API abuse and 
avoiding overbilling issues.
+
+## Summary
+
+In the [Apache APISIX 
3.12.0](https://apisix.apache.org/blog/2025/04/01/release-apache-apisix-3.12.0/),
 the APISIX AI Gateway has strengthened its AI support capabilities as a modern 
API gateway through a rich plugin ecosystem and flexible architectural design.
+
+It offers features such as proxy and request management, traffic control, 
prompt processing, content moderation, and data enhancement, with support for 
integration with services like OpenAI and DeepSeek. Performance and security 
are optimized through mechanisms like load balancing, rate limiting, and 
content filtering.
+
+Moreover, APISIX has enhanced observability, enabling real-time monitoring of 
key metrics to help teams optimize costs, identify performance issues, and 
ensure transparency. This release provides developers with a powerful, 
flexible, and secure platform for building and managing AI-driven applications.
diff --git a/blog/zh/blog/2025/04/08/introducing-apisix-ai-gateway.md 
b/blog/zh/blog/2025/04/08/introducing-apisix-ai-gateway.md
new file mode 100644
index 00000000000..ff84713635c
--- /dev/null
+++ b/blog/zh/blog/2025/04/08/introducing-apisix-ai-gateway.md
@@ -0,0 +1,305 @@
+---
+title: "APISIX AI 网关介绍"
+authors:
+  - name: Yilia Lin
+    title: Technical Writer
+    url: https://github.com/Yilialinn
+    image_url: https://github.com/Yilialinn.png
+keywords:
+  - AI 网关
+  - APISIX AI 网关
+  - API 网关
+  - AI 插件
+  - API 管理
+  - AI 驱动应用
+description: 在 Apache APISIX 3.12.0 版本中，我们进一步强化了其作为现代 API 网关的 AI 
支持能力。通过丰富的插件生态和灵活的架构设计，为开发者提供了完整的 AI 网关产品。
+tags: [Ecosystem]
+image: 
https://static.api7.ai/uploads/2025/03/07/Qs4WrU0I_apisix-ai-gateway.webp
+---
+
+在 Apache APISIX 3.12.0 版本中，我们进一步强化了其作为现代 API 网关的 AI 
支持能力。通过丰富的插件生态和灵活的架构设计，为开发者提供了完整的 AI 网关产品。
+
+本文将从以下几个维度解析 APISIX 在 AI 网关领域的创新实践。
+
+## AI 网关的核心功能
+
+APISIX 的插件生态为 AI 场景提供了开箱即用的能力，以下是核心插件及其功能。
+
+### 一、代理与请求管理
+
+1. **ai-proxy**
+
+[`ai-proxy`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-proxy/) 
插件通过将插件配置转换为指定的请求格式，简化了对大语言模型（LLM）和嵌入模型的访问。它支持与 OpenAI、DeepSeek 以及其他兼容 OpenAI 
API 的服务进行集成。
+
+此外，该插件还支持将大语言模型的请求信息记录到访问日志中，例如令牌使用量、模型、首次响应时间等信息。
+
+2. **ai-proxy-multi**
+
+[`ai-proxy-multi`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-proxy-multi/)
 插件通过将插件配置转换为 OpenAI、DeepSeek 以及其他兼容 OpenAI API 
的服务所需的请求格式，简化了对大语言模型（LLM）和嵌入模型的访问。它扩展了 `ai-proxy` 的功能，增加了负载均衡、重试、回退和健康检查等功能。
+
+此外，该插件还支持将大语言模型的请求信息记录到访问日志中，例如令牌使用量、模型、首次响应时间等信息。
+
+**示例：负载均衡实例**：
+
+以下示例展示了如何配置两个模型以实现负载均衡，将 80% 的流量转发到一个实例，20% 的流量转发到另一个实例。
+
+为了便于演示和区分，我们将配置一个 OpenAI 实例和一个 DeepSeek 实例作为上游 LLM 服务。
+创建一个路由，并根据需要更新 LLM 服务提供商、模型、API 密钥和端点，将 `openai-instance` 的权重配置为 `8`，将 
`deepseek-instance` 的权重配置为 `2`。
+
+```bash
+curl "http://127.0.0.1:9180/apisix/admin/routes"; -X PUT \
+  -H "X-API-KEY: ${ADMIN_API_KEY}" \
+  -d '{
+    "id": "ai-proxy-multi-route",
+    "uri": "/anything",
+    "methods": ["POST"],
+    "plugins": {
+      "ai-proxy-multi": {
+        "instances": [
+          {
+            "name": "openai-instance",
+            "provider": "openai",
+            "weight": 8,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "gpt-4"
+            }
+          },
+          {
+            "name": "deepseek-instance",
+            "provider": "deepseek",
+            "weight": 2,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "deepseek-chat"
+            }
+          }
+        ]
+      }
+    }
+  }'
+```
+
+3. **ai-request-rewrite**
+
+[`ai-request-rewrite`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-request-rewrite)
 插件通过将客户端请求转发至大语言模型（LLM）服务进行转换，然后将请求传递至后端服务，从而实现对请求的处理。这使得插件能够利用 LLM 
实现数据屏蔽、内容丰富或格式重组等修改功能。此外，该插件支持与 OpenAI、DeepSeek 以及其他与 OpenAI 兼容的 API 进行集成。
+
+**示例：屏蔽敏感信息**：
+
+### 屏蔽敏感信息
+
+以下示例展示了如何使用 `ai-request-rewrite` 插件在请求到达后端服务之前屏蔽敏感信息。
+
+创建一个路由并按以下方式配置 `ai-request-rewrite` 插件，指定提供者为`openai`，在 `Authorization` 标头中附加 
OpenAI API 密钥，指定模型名称，并指定在请求到达后端服务之前要屏蔽的信息。
+
+```shell
+curl "http://127.0.0.1:9180/apisix/admin/routes"; -X PUT \
+  -H "X-API-KEY: ${ADMIN_API_KEY}" \
+  -d '{
+    "id": "ai-request-rewrite-route",
+    "uri": "/anything",
+    "methods": ["POST"],
+    "plugins": {
+      "ai-request-rewrite": {
+        "provider": "openai",
+        "auth": {
+          "header": {
+            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
+          }
+        },
+        "options":{
+          "model": "gpt-4"
+        },
+        "prompt": "Given a JSON request body, identify and mask any sensitive 
information such as credit card numbers, social security numbers, and personal 
identification numbers (e.g., passport or driver'\''s license numbers). Replace 
detected sensitive values with a masked format (e.g., \"*** **** **** 1234\") 
for credit card numbers. Ensure the JSON structure remains unchanged."
+      }
+    },
+    "upstream": {
+      "type": "roundrobin",
+      "nodes": {
+        "httpbin.org:80": 1
+      }
+    }
+  }'
+```
+
+向该路由发送一个包含一些个人身份识别信息的 POST 请求：
+
+```shell
+curl "http://127.0.0.1:9080/anything"; -X POST \
+  -H "Content-Type: application/json" \
+  -d '{
+    "content": "John said his debit card number is 4111 1111 1111 1111 and SIN 
is 123-45-6789."
+  }'
+```
+
+响应示例：
+
+```json
+{
+  "args": {},
+  # highlight-next-line
+  "data": "{\n    \"content\": \"John said his debit card number is **** **** 
**** 1111 and SIN is ***-**-****.\"\n  }"
+  ...,
+  "json": {
+    "messages": [
+      {
+        "content": "Client information from customer service calls",
+        "role": "system"
+      }, 
+      {
+        # highlight-next-line
+        "content": "John said his debit card number is **** **** **** 1111 and 
SIN is ***-**-****."
+        "role": "user"
+      }
+    ], 
+    "model": "openai"
+  }, 
+  "method": "POST", 
+  "origin": "192.168.97.1, 103.97.2.170", 
+  "url": "http://127.0.0.1/anything";
+}
+```
+
+### 二、流量控制
+
+4. **ai-rate-limiting**
+
+[ai-rate-limiting](https://apisix.apache.org/zh/docs/apisix/plugins/ai-rate-limiting/)
 插件通过基于令牌的速率限制机制，对发送到大语言模型（LLM）服务的请求进行限制。它通过控制在指定时间范围内消耗的令牌数量，帮助管理 API 
使用量，确保资源分配公平，并防止服务过载。该插件通常与 `ai-proxy-multi` 插件配合使用。
+
+**示例：速率限制单个实例**：
+
+以下示例展示了如何使用 `ai-proxy-multi` 配置两个模型以实现负载均衡，将 80% 的流量转发到一个实例，20% 
的流量转发到另一个实例。此外，使用 `ai-rate-limiting` 在接收 80% 
流量的实例上配置基于令牌的速率限制，以便在配置的配额完全消耗时，额外的流量将被转发到另一个实例。
+
+创建一个路由，并根据需要更新您的 LLM 服务提供商、模型、API 密钥和端点：
+
+```bash
+curl "http://127.0.0.1:9180/apisix/admin/routes"; -X PUT \
+  -H "X-API-KEY: ${ADMIN_API_KEY}" \
+  -d '{
+    "id": "ai-rate-limiting-route",
+    "uri": "/anything",
+    "methods": ["POST"],
+    "plugins": {
+      "ai-rate-limiting": {
+        "instances": [
+          {
+            "name": "deepseek-instance-1",
+            "provider": "deepseek",
+            "weight": 8,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "deepseek-chat"
+            }
+          },
+          {
+            "name": "deepseek-instance-2",
+            "provider": "deepseek",
+            "weight": 2,
+            "auth": {
+              "header": {
+                "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+              }
+            },
+            "options": {
+              "model": "deepseek-chat"
+            }
+          }
+        ]
+      },
+      "ai-rate-limiting": {
+        "instances": [
+          {
+            "name": "deepseek-instance-1",
+            "limit_strategy": "total_tokens",
+            "limit": 100,
+            "time_window": 30
+          }
+        ]
+      }
+    }
+  }'
+```
+
+向路由发送一个 POST 请求，并在请求正文中包含系统提示和示例用户问题：
+
+```bash
+curl "http://127.0.0.1:9080/anything"; -X POST \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      { "role": "system", "content": "You are a mathematician" },
+      { "role": "user", "content": "What is 1+1?" }
+    ]
+  }'
+```
+
+响应示例：
+
+```json
+{
+  ...
+  "model": "deepseek-chat",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "1 + 1 equals 2. This is a fundamental arithmetic operation 
where adding one unit to another results in a total of two units."
+      },
+      "logprobs": null,
+      "finish_reason": "stop"
+    }
+  ],
+  ...
+}
+```
+
+如果 `deepseek-instance-1` 实例在 30 秒时间窗口内消耗了 100 个令牌的速率限制配额，那么额外的请求将全部转发到未进行速率限制的 
`deepseek-instance-2` 实例。
+
+### 三、提示词处理
+
+5. **ai-prompt-decorator**
+
+[`ai-prompt-decorator`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-prompt-decorator/)
 插件通过在用户输入的提示前后添加预设计的提示来设置内容生成的上下文。这种做法有助于模型在交互过程中按照期望的指导方针运行。
+
+6. **ai-prompt-template**
+
+[`ai-prompt-template`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-prompt-template/)
 插件支持预配置提示模板，这些模板仅接受用户在指定模板变量中的输入，以“填空”的方式运行。
+
+7. **ai-prompt-guard**
+
+[`ai-prompt-guard`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-prompt-guard/)
 插件通过检查和验证传入的提示消息来保护您的大语言模型（LLM）端点。它会根据用户定义的允许和拒绝模式检查请求内容，确保只有经过批准的输入才会被转发到上游的 
LLM。根据其配置，该插件可以仅检查最新消息或整个对话历史，并且可以设置为检查所有角色的提示或仅检查最终用户的提示。
+
+### 四、内容审核
+
+8. **ai-aws-content-moderation**
+
+[`ai-aws-content-moderation`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-aws-content-moderation/)
 插件支持与 AWS Comprehend 
集成，在代理到大语言模型（LLM）时检查请求正文中的毒性内容，例如脏话、仇恨言论、侮辱、骚扰、暴力等，并在评估结果超过配置的阈值时拒绝请求。
+
+### 五、数据增强
+
+9. **ai-rag**
+
+[`ai-rag`](https://apisix.apache.org/zh/docs/apisix/plugins/ai-rag/) 
插件为大语言模型（LLM）提供了检索增强生成（Retrieval-Augmented Generation, 
RAG）功能。它能够从外部数据源高效检索相关的文档或信息，并将其用于增强 LLM 的响应，从而提高生成输出的准确性和上下文相关性。该插件支持使用 Azure 
OpenAI 和 Azure AI Search 服务来生成嵌入并执行向量搜索。
+
+## 可观测性
+
+APISIX 在可观测性方面针对 AI 
应用进行了增强，能够实现实时监控关键指标，如首次生成令牌时间（TTFT）、令牌使用量和错误率。这些功能帮助团队优化成本、及时发现性能问题，并通过详细的日志和审计机制确保透明度。此外，APISIX
 通过访问日志和可观测性组件，追踪令牌使用情况，有效防止 API 滥用并避免过度计费问题。
+
+## 总结
+
+在 [Apache APISIX 
3.12.0](https://apisix.apache.org/zh/blog/2025/04/01/release-apache-apisix-3.12.0/)
 版本中，APISIX AI 网关通过丰富的插件生态和灵活的架构设计，强化了其作为现代 API 网关的 AI 
支持能力。它提供了代理与请求管理、流量控制、提示词处理、内容审核和数据增强等功能，支持与 OpenAI、DeepSeek 
等服务集成，并通过负载均衡、速率限制和内容过滤等机制优化性能和安全性。
+
+此外，APISIX 
在可观测性方面进行了增强，实现实时监控关键指标，帮助团队优化成本、发现性能问题，并确保透明度。这一版本为开发者提供了一个强大、灵活且安全的平台，用于构建和管理 
AI 驱动的应用程序。

(apisix-website) branch master updated: blog: add introducing-apisix-ai-gateway (#1888)

Reply via email to