AlinsRan opened a new pull request, #13487: URL: https://github.com/apache/apisix/pull/13487
### Description This PR adds three built-in Prometheus histogram metrics for the AI Gateway, complementing the existing LLM metrics: - **`apisix_llm_ttft`** — LLM time to first token (milliseconds), observed for streaming (`ai_stream`) requests only. The existing `apisix_llm_latency` mixes streaming TTFT and non-streaming total latency in one series; this dedicated metric keeps the TTFT distribution semantically consistent so it can be used for streaming latency SLOs. - **`apisix_llm_prompt_tokens_dist`** / **`apisix_llm_completion_tokens_dist`** — histograms of prompt/completion tokens per request. The existing `apisix_llm_prompt_tokens` / `apisix_llm_completion_tokens` are counters (totals only); these histograms add a distribution so quantiles such as p95 prompt size can be computed. Buckets are configurable via `plugin_attr.prometheus`: - `llm_ttft_buckets` (unit: millisecond, defaults to the standard latency buckets) - `llm_prompt_tokens_buckets` / `llm_completion_tokens_buckets` (unit: token) Token histogram default buckets are tuned to real-world token ranges with the upper bound at 1M to cover large-context models. The OTel GenAI semantic conventions define the instrument type and unit for these but do not prescribe bucket boundaries, so the defaults are chosen empirically and remain overridable. The existing counters, gauge, and `apisix_llm_latency` are left unchanged for backward compatibility. #### Which issue(s) this PR fixes: Fixes # ### Checklist - [x] I have explained the need for this PR and the problem it solves - [x] I have explained the changes or the new features added to this PR - [x] I have added tests corresponding to this change - [x] I have updated the documentation to reflect this change - [x] I have verified that this change is backward compatible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
