Re: [PR] feat: ai rate limiting standard headers [apisix]

via GitHub Mon, 02 Mar 2026 02:52:10 -0800


Copilot commented on code in PR #13049:
URL: https://github.com/apache/apisix/pull/13049#discussion_r2871751887



##########
apisix/plugins/ai-rate-limiting.lua:
##########
@@ -177,9 +181,23 @@ local function transform_limit_conf(plugin_conf, 
instance_conf, instance_name)
     limit_conf._meta = plugin_conf._meta
     limit_conf.count = limit
     limit_conf.time_window = time_window
-    limit_conf.limit_header = "X-AI-RateLimit-Limit-" .. name
-    limit_conf.remaining_header = "X-AI-RateLimit-Remaining-" .. name
-    limit_conf.reset_header = "X-AI-RateLimit-Reset-" .. name
+
+    -- standard_headers=true 输出 OpenRouter/OpenAI 兼容头名
+    -- IDE 插件（Cursor/Continue）可直接识别并做退避
+    if plugin_conf.standard_headers then
+        local strategy = plugin_conf.limit_strategy or "total_tokens"
+        local suffix = strategy == "total_tokens"      and "Tokens"
+                    or strategy == "prompt_tokens"     and "PromptTokens"
+                    or "CompletionTokens"
+        limit_conf.limit_header     = "X-RateLimit-Limit-"     .. suffix
+        limit_conf.remaining_header = "X-RateLimit-Remaining-" .. suffix
+        limit_conf.reset_header     = "X-RateLimit-Reset-"     .. suffix
+    else
+        limit_conf.limit_header     = "X-AI-RateLimit-Limit-"     .. name
+        limit_conf.remaining_header = "X-AI-RateLimit-Remaining-" .. name
+        limit_conf.reset_header     = "X-AI-RateLimit-Reset-"     .. name
+    end

Review Comment:
   The newly added `standard_headers` option does not take effect when 
`conf.rules` is configured. `transform_limit_conf()` returns early for rules, 
and `limit-count` always sets `rule.header_prefix` (index if unset), which 
forces `X-{prefix}-RateLimit-*` header names and ignores 
`conf.limit_header`/`remaining_header`/`reset_header`. Consider either (1) 
rejecting `standard_headers=true` when `rules` is set (schema/custom 
validation) and documenting the limitation, or (2) reworking the rules path so 
standard header names can be emitted.



##########
docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.md:
##########
@@ -0,0 +1,86 @@
+# ai-rate-limiting — `standard_headers` Parameter
+
+## Overview
+
+The `standard_headers` option makes `ai-rate-limiting` emit rate-limit response
+headers that follow the [OpenRouter / OpenAI convention][openrouter-headers],
+so IDE extensions such as **Cursor** and **Continue** can detect quota 
exhaustion
+and apply automatic back-off without any custom configuration.
+
+[openrouter-headers]: https://openrouter.ai/docs/api-reference/limits
+
+## New Parameter
+
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `standard_headers` | boolean | `false` | When `true`, emit 
OpenAI/OpenRouter-compatible rate-limit headers instead of the legacy 
`X-AI-RateLimit-*` headers. |
+

Review Comment:
   This file documents `standard_headers`, but the main plugin documentation 
(`docs/en/latest/plugins/ai-rate-limiting.md`) is still the canonical reference 
and currently has no mention of `standard_headers` (and its header names) while 
it explicitly documents `X-AI-RateLimit-*`. Please update `ai-rate-limiting.md` 
directly (and consider removing this “patch” file or clearly 
integrating/linking it) so users can discover the option in the standard docs.



##########
apisix/plugins/ai-drivers/openai-base.lua:
##########
@@ -131,10 +131,12 @@ local function read_response(conf, ctx, res, 
response_filter)
                         core.log.info("got token usage from ai service: ",
                                             core.json.delay_encode(data.usage))
                         ctx.llm_raw_usage = data.usage
+                        local pt = data.usage.prompt_tokens or 
data.usage.input_tokens or 0
+                        local ct = data.usage.completion_tokens or 
data.usage.output_tokens or 0
                         ctx.ai_token_usage = {
-                            prompt_tokens = data.usage.prompt_tokens or 0,
-                            completion_tokens = data.usage.completion_tokens 
or 0,
-                            total_tokens = data.usage.total_tokens or 0,
+                            prompt_tokens = pt,
+                            completion_tokens = ct,
+                            total_tokens = data.usage.total_tokens or (pt + 
ct),

Review Comment:
   This PR introduces support for `input_tokens`/`output_tokens` (and a derived 
`total_tokens`) in the OpenAI driver, but that behavior change isn’t mentioned 
in the PR description. Please either update the PR description/scope to include 
it or move this change into a separate PR; also consider adding/adjusting tests 
that cover responses using `input_tokens`/`output_tokens` to prevent 
regressions.



##########
apisix/plugins/ai-rate-limiting.lua:
##########
@@ -65,6 +65,10 @@ local schema = {
             default = "total_tokens",
             description = "The strategy to limit the tokens"
         },
+        -- 使用 OpenRouter/OpenAI 兼容的标准头名，IDE 插件（Cursor/Continue）可直接识别
+        -- true:  X-RateLimit-Limit-Tokens / X-RateLimit-Remaining-Tokens / 
X-RateLimit-Reset-Tokens
+        -- false: X-AI-RateLimit-Limit-{instance} (原有行为)
+        standard_headers = {type = "boolean", default = false},

Review Comment:
   The new inline comments for `standard_headers` are in Chinese, while the 
surrounding file and English docs use English. To keep the codebase consistent 
and accessible to all contributors, please translate these comments (or move 
the explanation into the schema `description` field in English).
   ```suggestion
           standard_headers = {
               type = "boolean",
               default = false,
               description = "Use OpenRouter/OpenAI-compatible standard rate 
limit header names (true: X-RateLimit-Limit-Tokens / 
X-RateLimit-Remaining-Tokens / X-RateLimit-Reset-Tokens; false: keep original 
behavior with X-AI-RateLimit-Limit-{instance})"
           },
   ```



##########
t/plugin/ai-rate-limiting-standard-headers.t:
##########
@@ -0,0 +1,393 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+use t::APISIX 'no_plan';
+
+log_level("info");
+repeat_each(1);
+no_long_string();
+no_root_location();
+
+add_block_preprocessor(sub {
+    my ($block) = @_;
+
+    if (!defined $block->request) {
+        $block->set_value("request", "GET /t");
+    }
+
+    my $extra_yaml_config = <<_EOC_;
+plugins:
+  - ai-proxy-multi
+  - ai-rate-limiting
+  - prometheus
+_EOC_
+    $block->set_value("extra_yaml_config", $extra_yaml_config);
+
+    # Default mock LLM backend on port 6799
+    if (!defined $block->http_config) {
+        my $http_config = <<_EOC_;
+    server {
+        server_name mock-llm;
+        listen 6799;
+
+        default_type 'application/json';
+
+        location /v1/chat/completions {
+            content_by_lua_block {
+                ngx.status = 200
+                ngx.say([[{
+                    "id": "chatcmpl-test",
+                    "object": "chat.completion",
+                    "choices": 
[{"index":0,"message":{"role":"assistant","content":"hi"},"finish_reason":"stop"}],
+                    "usage": {"prompt_tokens": 10, "completion_tokens": 5, 
"total_tokens": 15}
+                }]])
+            }
+        }
+    }
+_EOC_
+        $block->set_value("http_config", $http_config);
+    }
+});
+
+run_tests();
+
+__DATA__
+
+=== TEST 1: schema check — standard_headers field is accepted
+--- apisix_yaml
+routes:
+  - id: 1
+    uri: /t
+    plugins:
+      ai-rate-limiting:
+        instances:
+          - name: mock-instance
+            limit: 1000
+            time_window: 60
+        limit_strategy: total_tokens
+        standard_headers: true
+        rejected_code: 429
+    upstream:
+      nodes:
+        "127.0.0.1:6799": 1
+      type: roundrobin
+#END
+--- config
+    location /t {
+        content_by_lua_block {
+            local plugin = require("apisix.plugins.ai-rate-limiting")
+            local ok, err = plugin.check_schema({
+                instances = {
+                    { name = "mock-instance", limit = 1000, time_window = 60 }
+                },
+                limit_strategy = "total_tokens",
+                standard_headers = true,
+                rejected_code = 429,
+            })
+            if not ok then
+                ngx.say("schema error: ", err)
+            else
+                ngx.say("passed")
+            end
+        }
+    }
+--- response_body
+passed
+
+
+
+=== TEST 2: schema check — standard_headers defaults to false
+--- apisix_yaml
+routes:
+  - id: 1
+    uri: /t
+    plugins:
+      ai-rate-limiting:
+        instances:
+          - name: mock-instance
+            limit: 1000
+            time_window: 60
+    upstream:
+      nodes:
+        "127.0.0.1:6799": 1
+      type: roundrobin
+#END
+--- config
+    location /t {
+        content_by_lua_block {
+            local plugin = require("apisix.plugins.ai-rate-limiting")
+            local conf = {
+                instances = {
+                    { name = "mock-instance", limit = 1000, time_window = 60 }
+                },
+            }
+            local ok, err = plugin.check_schema(conf)
+            if not ok then
+                ngx.say("schema error: ", err)
+                return
+            end
+            -- default should be false
+            if conf.standard_headers == false then
+                ngx.say("default is false")
+            else
+                ngx.say("unexpected default: ", 
tostring(conf.standard_headers))
+            end
+        }
+    }
+--- response_body
+default is false
+
+
+
+=== TEST 3: standard_headers=true returns X-RateLimit-Limit-Tokens header
+--- apisix_yaml
+routes:
+  - id: 1
+    uri: /anything
+    plugins:
+      ai-proxy-multi:
+        instances:
+          - name: mock-instance
+            provider: openai
+            weight: 1
+            auth:
+              header:
+                Authorization: "Bearer test-key"
+            options:
+              model: gpt-4o-mini
+            override:
+              endpoint: "http://localhost:6799/v1/chat/completions";
+        ssl_verify: false
+      ai-rate-limiting:
+        instances:
+          - name: mock-instance
+            limit: 10000
+            time_window: 60
+        limit_strategy: total_tokens
+        standard_headers: true
+        rejected_code: 429
+#END
+--- request
+POST /anything
+{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}
+--- more_headers
+Content-Type: application/json
+apikey: test-key-123
+--- error_code: 200
+--- response_headers_like
+X-RateLimit-Limit-Tokens: \d+
+X-RateLimit-Remaining-Tokens: \d+
+X-RateLimit-Reset-Tokens: \d+
+
+
+
+=== TEST 4: standard_headers=true, 429 response has Remaining-Tokens: 0
+--- apisix_yaml
+routes:
+  - id: 1
+    uri: /anything
+    plugins:
+      ai-proxy-multi:
+        instances:
+          - name: mock-instance
+            provider: openai
+            weight: 1
+            auth:
+              header:
+                Authorization: "Bearer test-key"
+            options:
+              model: gpt-4o-mini
+            override:
+              endpoint: "http://localhost:6799/v1/chat/completions";
+        ssl_verify: false
+      ai-rate-limiting:
+        instances:
+          - name: mock-instance
+            limit: 1
+            time_window: 60
+        limit_strategy: total_tokens
+        standard_headers: true
+        rejected_code: 429
+#END
+--- config
+    location /t {
+        content_by_lua_block {
+            local http = require("resty.http")
+            local httpc = http.new()
+
+            -- First request: should succeed and consume the 1-token budget
+            local res1, err = httpc:request_uri("http://127.0.0.1:"; .. 
ngx.var.server_port .. "/anything", {
+                method = "POST",
+                headers = {
+                    ["Content-Type"] = "application/json",
+                    ["apikey"] = "test-key-123",
+                },
+                body = 
[[{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}]],

Review Comment:
   The comment says the first request will “consume the 1-token budget”, but 
the mocked upstream returns `total_tokens: 15`. This can confuse future readers 
when interpreting why the second request is rate-limited. Consider rewording 
the comment to reflect that the first request exceeds the configured limit and 
drives Remaining to 0 for subsequent requests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: ai rate limiting standard headers [apisix]

Reply via email to