AlinsRan opened a new pull request, #13477:
URL: https://github.com/apache/apisix/pull/13477

   ## Summary
   
   Add 8 built-in nginx variables that `ai-proxy` sets automatically on every 
request. These variables expose LLM request/response metadata in the nginx 
access log and all logger plugins without any additional configuration.
   
   ### New variables
   
   | Variable | Type | Description |
   |---|---|---|
   | `$llm_total_tokens` | integer | Total tokens (prompt + completion) |
   | `$llm_stream` | boolean | `true` if request is streaming |
   | `$llm_has_tool_calls` | boolean | `true` if response contains tool calls |
   | `$llm_tool_count` | integer | Number of tools in the upstream request body 
|
   | `$llm_end_user_id` | string | End-user identifier from request body |
   | `$llm_cache_read_input_tokens` | integer | Prompt tokens served from 
provider cache |
   | `$llm_cache_creation_input_tokens` | integer | Prompt tokens written to 
provider cache |
   | `$llm_reasoning_tokens` | integer | Reasoning tokens (OpenAI o1/o3, 
Responses API) |
   
   ### Provider mapping for cache/reasoning tokens
   
   | Variable | OpenAI Chat | OpenAI Responses | Anthropic | DeepSeek |
   |---|---|---|---|---|
   | `llm_cache_read_input_tokens` | `prompt_tokens_details.cached_tokens` | 
`input_tokens_details.cached_tokens` | `cache_read_input_tokens` | 
`prompt_cache_hit_tokens` |
   | `llm_cache_creation_input_tokens` | — | — | `cache_creation_input_tokens` 
| — |
   | `llm_reasoning_tokens` | `completion_tokens_details.reasoning_tokens` | 
`output_tokens_details.reasoning_tokens` | — | — |
   
   End-user ID extraction precedence: `safety_identifier` > `user` 
(OpenAI/compatible) or `metadata.user_id` (Anthropic Messages).
   
   ### Example access log usage
   
   ```nginx
   log_format main '$llm_model $llm_total_tokens $llm_stream 
$llm_has_tool_calls $llm_end_user_id';
   ```
   
   ### Also includes
   
   - Optional `on_event` callback parameter added to `parse_streaming_response` 
for per-event processing (e.g., streaming tool call detection).
   - `extract_usage` updated across all three protocol adapters to extract 
cache and reasoning token counts from provider responses.
   
   ## Files changed
   
   | File | Change |
   |---|---|
   | `apisix/cli/ngx_tpl.lua` | Add 8 `set $llm_*` variable definitions |
   | `apisix/core/ctx.lua` | Register 8 variables in `ngx_var_names` |
   | `apisix/plugins/ai-protocols/openai-chat.lua` | Extract cache/reasoning 
tokens from `prompt_tokens_details` / `completion_tokens_details` |
   | `apisix/plugins/ai-protocols/openai-responses.lua` | Extract 
cache/reasoning tokens from `input_tokens_details` / `output_tokens_details` |
   | `apisix/plugins/ai-protocols/anthropic-messages.lua` | Extract cache 
tokens from Anthropic usage fields |
   | `apisix/plugins/ai-providers/base.lua` | Set new variables from usage; add 
`on_event` callback |
   | `apisix/plugins/ai-proxy/base.lua` | Compute `llm_stream`, 
`llm_tool_count`, `llm_end_user_id`, `llm_has_tool_calls`, `llm_total_tokens` |
   | `t/APISIX.pm` | Add `set` directives + extend `log_format main` |
   | `t/plugin/ai-proxy3.t` | 6 test cases (TEST 7–12) covering all new 
variables |
   
   ## Test plan
   
   ```bash
   prove -I. -r t/plugin/ai-proxy3.t
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to