[PR] feat(prometheus): add built-in LLM histograms for TTFT and token distribution [apisix]

via GitHub Mon, 08 Jun 2026 16:31:10 -0700


AlinsRan opened a new pull request, #13487:
URL: https://github.com/apache/apisix/pull/13487


   ### Description
   
   This PR adds three built-in Prometheus histogram metrics for the AI Gateway, 
complementing the existing LLM metrics:
   
   - **`apisix_llm_ttft`** — LLM time to first token (milliseconds), observed 
for streaming (`ai_stream`) requests only. The existing `apisix_llm_latency` 
mixes streaming TTFT and non-streaming total latency in one series; this 
dedicated metric keeps the TTFT distribution semantically consistent so it can 
be used for streaming latency SLOs.
   - **`apisix_llm_prompt_tokens_dist`** / 
**`apisix_llm_completion_tokens_dist`** — histograms of prompt/completion 
tokens per request. The existing `apisix_llm_prompt_tokens` / 
`apisix_llm_completion_tokens` are counters (totals only); these histograms add 
a distribution so quantiles such as p95 prompt size can be computed.
   
   Buckets are configurable via `plugin_attr.prometheus`:
   - `llm_ttft_buckets` (unit: millisecond, defaults to the standard latency 
buckets)
   - `llm_prompt_tokens_buckets` / `llm_completion_tokens_buckets` (unit: token)
   
   Token histogram default buckets are tuned to real-world token ranges with 
the upper bound at 1M to cover large-context models. The OTel GenAI semantic 
conventions define the instrument type and unit for these but do not prescribe 
bucket boundaries, so the defaults are chosen empirically and remain 
overridable.
   
   The existing counters, gauge, and `apisix_llm_latency` are left unchanged 
for backward compatibility.
   
   #### Which issue(s) this PR fixes:
   Fixes #
   
   ### Checklist
   
   - [x] I have explained the need for this PR and the problem it solves
   - [x] I have explained the changes or the new features added to this PR
   - [x] I have added tests corresponding to this change
   - [x] I have updated the documentation to reflect this change
   - [x] I have verified that this change is backward compatible
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(prometheus): add built-in LLM histograms for TTFT and token distribution [apisix]

Reply via email to