Copilot commented on code in PR #13049:
URL: https://github.com/apache/apisix/pull/13049#discussion_r2871751887
##########
apisix/plugins/ai-rate-limiting.lua:
##########
@@ -177,9 +181,23 @@ local function transform_limit_conf(plugin_conf,
instance_conf, instance_name)
limit_conf._meta = plugin_conf._meta
limit_conf.count = limit
limit_conf.time_window = time_window
- limit_conf.limit_header = "X-AI-RateLimit-Limit-" .. name
- limit_conf.remaining_header = "X-AI-RateLimit-Remaining-" .. name
- limit_conf.reset_header = "X-AI-RateLimit-Reset-" .. name
+
+ -- standard_headers=true 输出 OpenRouter/OpenAI 兼容头名
+ -- IDE 插件(Cursor/Continue)可直接识别并做退避
+ if plugin_conf.standard_headers then
+ local strategy = plugin_conf.limit_strategy or "total_tokens"
+ local suffix = strategy == "total_tokens" and "Tokens"
+ or strategy == "prompt_tokens" and "PromptTokens"
+ or "CompletionTokens"
+ limit_conf.limit_header = "X-RateLimit-Limit-" .. suffix
+ limit_conf.remaining_header = "X-RateLimit-Remaining-" .. suffix
+ limit_conf.reset_header = "X-RateLimit-Reset-" .. suffix
+ else
+ limit_conf.limit_header = "X-AI-RateLimit-Limit-" .. name
+ limit_conf.remaining_header = "X-AI-RateLimit-Remaining-" .. name
+ limit_conf.reset_header = "X-AI-RateLimit-Reset-" .. name
+ end
Review Comment:
The newly added `standard_headers` option does not take effect when
`conf.rules` is configured. `transform_limit_conf()` returns early for rules,
and `limit-count` always sets `rule.header_prefix` (index if unset), which
forces `X-{prefix}-RateLimit-*` header names and ignores
`conf.limit_header`/`remaining_header`/`reset_header`. Consider either (1)
rejecting `standard_headers=true` when `rules` is set (schema/custom
validation) and documenting the limitation, or (2) reworking the rules path so
standard header names can be emitted.
##########
docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.md:
##########
@@ -0,0 +1,86 @@
+# ai-rate-limiting — `standard_headers` Parameter
+
+## Overview
+
+The `standard_headers` option makes `ai-rate-limiting` emit rate-limit response
+headers that follow the [OpenRouter / OpenAI convention][openrouter-headers],
+so IDE extensions such as **Cursor** and **Continue** can detect quota
exhaustion
+and apply automatic back-off without any custom configuration.
+
+[openrouter-headers]: https://openrouter.ai/docs/api-reference/limits
+
+## New Parameter
+
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `standard_headers` | boolean | `false` | When `true`, emit
OpenAI/OpenRouter-compatible rate-limit headers instead of the legacy
`X-AI-RateLimit-*` headers. |
+
Review Comment:
This file documents `standard_headers`, but the main plugin documentation
(`docs/en/latest/plugins/ai-rate-limiting.md`) is still the canonical reference
and currently has no mention of `standard_headers` (and its header names) while
it explicitly documents `X-AI-RateLimit-*`. Please update `ai-rate-limiting.md`
directly (and consider removing this “patch” file or clearly
integrating/linking it) so users can discover the option in the standard docs.
##########
apisix/plugins/ai-drivers/openai-base.lua:
##########
@@ -131,10 +131,12 @@ local function read_response(conf, ctx, res,
response_filter)
core.log.info("got token usage from ai service: ",
core.json.delay_encode(data.usage))
ctx.llm_raw_usage = data.usage
+ local pt = data.usage.prompt_tokens or
data.usage.input_tokens or 0
+ local ct = data.usage.completion_tokens or
data.usage.output_tokens or 0
ctx.ai_token_usage = {
- prompt_tokens = data.usage.prompt_tokens or 0,
- completion_tokens = data.usage.completion_tokens
or 0,
- total_tokens = data.usage.total_tokens or 0,
+ prompt_tokens = pt,
+ completion_tokens = ct,
+ total_tokens = data.usage.total_tokens or (pt +
ct),
Review Comment:
This PR introduces support for `input_tokens`/`output_tokens` (and a derived
`total_tokens`) in the OpenAI driver, but that behavior change isn’t mentioned
in the PR description. Please either update the PR description/scope to include
it or move this change into a separate PR; also consider adding/adjusting tests
that cover responses using `input_tokens`/`output_tokens` to prevent
regressions.
##########
apisix/plugins/ai-rate-limiting.lua:
##########
@@ -65,6 +65,10 @@ local schema = {
default = "total_tokens",
description = "The strategy to limit the tokens"
},
+ -- 使用 OpenRouter/OpenAI 兼容的标准头名,IDE 插件(Cursor/Continue)可直接识别
+ -- true: X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens /
X-RateLimit-Reset-Tokens
+ -- false: X-AI-RateLimit-Limit-{instance} (原有行为)
+ standard_headers = {type = "boolean", default = false},
Review Comment:
The new inline comments for `standard_headers` are in Chinese, while the
surrounding file and English docs use English. To keep the codebase consistent
and accessible to all contributors, please translate these comments (or move
the explanation into the schema `description` field in English).
```suggestion
standard_headers = {
type = "boolean",
default = false,
description = "Use OpenRouter/OpenAI-compatible standard rate
limit header names (true: X-RateLimit-Limit-Tokens /
X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens; false: keep original
behavior with X-AI-RateLimit-Limit-{instance})"
},
```
##########
t/plugin/ai-rate-limiting-standard-headers.t:
##########
@@ -0,0 +1,393 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+use t::APISIX 'no_plan';
+
+log_level("info");
+repeat_each(1);
+no_long_string();
+no_root_location();
+
+add_block_preprocessor(sub {
+ my ($block) = @_;
+
+ if (!defined $block->request) {
+ $block->set_value("request", "GET /t");
+ }
+
+ my $extra_yaml_config = <<_EOC_;
+plugins:
+ - ai-proxy-multi
+ - ai-rate-limiting
+ - prometheus
+_EOC_
+ $block->set_value("extra_yaml_config", $extra_yaml_config);
+
+ # Default mock LLM backend on port 6799
+ if (!defined $block->http_config) {
+ my $http_config = <<_EOC_;
+ server {
+ server_name mock-llm;
+ listen 6799;
+
+ default_type 'application/json';
+
+ location /v1/chat/completions {
+ content_by_lua_block {
+ ngx.status = 200
+ ngx.say([[{
+ "id": "chatcmpl-test",
+ "object": "chat.completion",
+ "choices":
[{"index":0,"message":{"role":"assistant","content":"hi"},"finish_reason":"stop"}],
+ "usage": {"prompt_tokens": 10, "completion_tokens": 5,
"total_tokens": 15}
+ }]])
+ }
+ }
+ }
+_EOC_
+ $block->set_value("http_config", $http_config);
+ }
+});
+
+run_tests();
+
+__DATA__
+
+=== TEST 1: schema check — standard_headers field is accepted
+--- apisix_yaml
+routes:
+ - id: 1
+ uri: /t
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: mock-instance
+ limit: 1000
+ time_window: 60
+ limit_strategy: total_tokens
+ standard_headers: true
+ rejected_code: 429
+ upstream:
+ nodes:
+ "127.0.0.1:6799": 1
+ type: roundrobin
+#END
+--- config
+ location /t {
+ content_by_lua_block {
+ local plugin = require("apisix.plugins.ai-rate-limiting")
+ local ok, err = plugin.check_schema({
+ instances = {
+ { name = "mock-instance", limit = 1000, time_window = 60 }
+ },
+ limit_strategy = "total_tokens",
+ standard_headers = true,
+ rejected_code = 429,
+ })
+ if not ok then
+ ngx.say("schema error: ", err)
+ else
+ ngx.say("passed")
+ end
+ }
+ }
+--- response_body
+passed
+
+
+
+=== TEST 2: schema check — standard_headers defaults to false
+--- apisix_yaml
+routes:
+ - id: 1
+ uri: /t
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: mock-instance
+ limit: 1000
+ time_window: 60
+ upstream:
+ nodes:
+ "127.0.0.1:6799": 1
+ type: roundrobin
+#END
+--- config
+ location /t {
+ content_by_lua_block {
+ local plugin = require("apisix.plugins.ai-rate-limiting")
+ local conf = {
+ instances = {
+ { name = "mock-instance", limit = 1000, time_window = 60 }
+ },
+ }
+ local ok, err = plugin.check_schema(conf)
+ if not ok then
+ ngx.say("schema error: ", err)
+ return
+ end
+ -- default should be false
+ if conf.standard_headers == false then
+ ngx.say("default is false")
+ else
+ ngx.say("unexpected default: ",
tostring(conf.standard_headers))
+ end
+ }
+ }
+--- response_body
+default is false
+
+
+
+=== TEST 3: standard_headers=true returns X-RateLimit-Limit-Tokens header
+--- apisix_yaml
+routes:
+ - id: 1
+ uri: /anything
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: mock-instance
+ provider: openai
+ weight: 1
+ auth:
+ header:
+ Authorization: "Bearer test-key"
+ options:
+ model: gpt-4o-mini
+ override:
+ endpoint: "http://localhost:6799/v1/chat/completions"
+ ssl_verify: false
+ ai-rate-limiting:
+ instances:
+ - name: mock-instance
+ limit: 10000
+ time_window: 60
+ limit_strategy: total_tokens
+ standard_headers: true
+ rejected_code: 429
+#END
+--- request
+POST /anything
+{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}
+--- more_headers
+Content-Type: application/json
+apikey: test-key-123
+--- error_code: 200
+--- response_headers_like
+X-RateLimit-Limit-Tokens: \d+
+X-RateLimit-Remaining-Tokens: \d+
+X-RateLimit-Reset-Tokens: \d+
+
+
+
+=== TEST 4: standard_headers=true, 429 response has Remaining-Tokens: 0
+--- apisix_yaml
+routes:
+ - id: 1
+ uri: /anything
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: mock-instance
+ provider: openai
+ weight: 1
+ auth:
+ header:
+ Authorization: "Bearer test-key"
+ options:
+ model: gpt-4o-mini
+ override:
+ endpoint: "http://localhost:6799/v1/chat/completions"
+ ssl_verify: false
+ ai-rate-limiting:
+ instances:
+ - name: mock-instance
+ limit: 1
+ time_window: 60
+ limit_strategy: total_tokens
+ standard_headers: true
+ rejected_code: 429
+#END
+--- config
+ location /t {
+ content_by_lua_block {
+ local http = require("resty.http")
+ local httpc = http.new()
+
+ -- First request: should succeed and consume the 1-token budget
+ local res1, err = httpc:request_uri("http://127.0.0.1:" ..
ngx.var.server_port .. "/anything", {
+ method = "POST",
+ headers = {
+ ["Content-Type"] = "application/json",
+ ["apikey"] = "test-key-123",
+ },
+ body =
[[{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}]],
Review Comment:
The comment says the first request will “consume the 1-token budget”, but
the mocked upstream returns `total_tokens: 15`. This can confuse future readers
when interpreting why the second request is rate-limited. Consider rewording
the comment to reflect that the first request exceeds the configured limit and
drives Remaining to 0 for subsequent requests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]