(skywalking-website) branch master updated: Add the blog: Virtual GenAI Observability (#828)

wusheng Sun, 05 Apr 2026 22:33:46 -0700

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 02db835d0db Add the blog: Virtual GenAI Observability (#828)
02db835d0db is described below

commit 02db835d0db4c4360f6c1ce716117714c0eb8ea3
Author: peachisai <[email protected]>
AuthorDate: Mon Apr 6 13:33:34 2026 +0800

    Add the blog: Virtual GenAI Observability (#828)
---
 .../2026-04-05-virtual-genai-monitoring/index.md   | 244 +++++++++++++++++++++
 .../model-dashboard-1.png                          | Bin 0 -> 20892 bytes
 .../model-dashboard-2.png                          | Bin 0 -> 111377 bytes
 .../model-dashboard-3.png                          | Bin 0 -> 120768 bytes
 .../otlp-estimated-tag.png                         | Bin 0 -> 2960 bytes
 .../provider-dashboard-1.png                       | Bin 0 -> 29315 bytes
 .../provider-dashboard-2.png                       | Bin 0 -> 109436 bytes
 .../provider-dashboard-3.png                       | Bin 0 -> 53573 bytes
 .../2026-04-05-virtual-genai-monitoring/index.md   | 240 ++++++++++++++++++++
 .../model-dashboard-1.png                          | Bin 0 -> 20892 bytes
 .../model-dashboard-2.png                          | Bin 0 -> 111377 bytes
 .../model-dashboard-3.png                          | Bin 0 -> 120768 bytes
 .../otlp-estimated-tag.png                         | Bin 0 -> 2960 bytes
 .../provider-dashboard-1.png                       | Bin 0 -> 29315 bytes
 .../provider-dashboard-2.png                       | Bin 0 -> 109436 bytes
 .../provider-dashboard-3.png                       | Bin 0 -> 53573 bytes
 16 files changed, 484 insertions(+)

diff --git a/content/blog/2026-04-05-virtual-genai-monitoring/index.md 
b/content/blog/2026-04-05-virtual-genai-monitoring/index.md
new file mode 100644
index 00000000000..510aea99b78
--- /dev/null
+++ b/content/blog/2026-04-05-virtual-genai-monitoring/index.md
@@ -0,0 +1,244 @@
+---
+title: "Monitoring LLM Applications with SkyWalking 10.4: Insights into 
Performance and Cost"
+author: "YiMing Shao"
+date: 2026-04-05
+description: "SkyWalking 10.4 introduces the Virtual GenAI dashboard. This 
article explores how to use this new feature to monitor and optimize LLM 
integrations."
+tags:
+- GenAI
+- LLM
+- Observability
+- Opentelemetry
+---
+# The Problem: As Applications "Consume" LLMs, Monitoring Leaves a Blind Spot
+
+With the deep penetration of Generative AI (GenAI) into enterprise workflows, 
developers face a challenging paradox: while powerful LLM capabilities are 
easily integrated via `Spring AI` or `OpenAI SDKs`, the actual performance and 
reliability of these calls remain largely invisible.
+
+### 1. The "Black Box" of Cost and Performance: Is the Expensive Model Worth 
It?
+Facing high LLM bills, organizations often only see a total sum paid to a 
provider, but cannot calculate the "ROI" within the application.
+* **Blind Upgrades:** You might switch to a premium flagship model for a 
better experience. But in your specific business scenario, does paying several 
times more per token actually yield lower latency or a faster **TTFT (Time to 
First Token)**?
+* **Lack of Real-World Benchmarks:** Official benchmarks mean little without 
your real-world business requests. You need to know which model achieves the 
perfect balance between "Token/Cost Consumption" and "Response Speed" under 
your actual prompt lengths and concurrency levels.
+
+### 2. The Vanishing "Golden Timeout"
+Many teams set timeouts for LLM calls arbitrarily (e.g., 30s or 60s).
+* **Too Short:** During peak periods or long-text generation, requests are 
frequently interrupted, causing business failure rates to soar.
+* **Too Long:** If a provider hangs, requests pile up in memory, blocking 
execution threads and potentially leading to the collapse of the entire Java 
application or microservice cluster.
+  Only by mastering the **P99/P95 Latency** can you set rational timeout 
policies based on data rather than intuition.
+
+### 3. The Overlooked Experience Killer: TTFT
+In GenAI scenarios, a user's perception of speed depends less on the total 
duration of the conversation and more on **"when the first word appears."** * A 
streaming response with a 10s total duration but a **500ms TTFT** feels 
instantaneous.
+* A non-streaming response with a 5s total duration but a **4s TTFT** feels 
"frozen."
+  If your observability system only tracks total latency, you miss the core UX 
metric that explains why users complain about "AI slowness."
+
+---
+
+**SkyWalking 10.4: A "Digital Dashboard"**  
+From the Application Perspective The **Virtual GenAI** capability introduced 
in Apache SkyWalking 10.4 fills this "observability vacuum." It avoids reliance 
on external gateways by using application-side probes (like the Java Agent) to 
collect the most authentic data from the client's perspective.
+
+* **Precise Latency Distribution:** Multi-dimensional metrics (P50, P90, P99) 
help visualize LLM fluctuations to inform dynamic timeout strategies.
+* **Core UX Metric — TTFT Monitoring:** Native support for first-token latency 
in streaming calls.
+* **Multi-dimensional Model Profiling:** Aligns token usage, estimated cost, 
and performance across Providers and Models, helping you choose the most 
cost-effective solution for your specific needs.
+
+---
+
+# Virtual GenAI Observability
+
+> **Virtual GenAI** represents Generative AI service nodes detected by probe 
plugins. All performance metrics are based on the **GenAI Client Perspective**.
+
+For instance, the **Spring AI plugin** in the Java Agent detects the response 
latency of a Chat Completion request. SkyWalking then visualizes these in the 
dashboard:
+* **Traffic & Success Rate** (CPM & SLA)
+* **Latency & TTFT**
+* **Token Usage** (Input/Output)
+* **Estimated Cost**
+
+**Screenshots:**
+![provider-dashboard-1.png](provider-dashboard-1.png)
+![provider-dashboard-2.png](provider-dashboard-2.png)
+![provider-dashboard-3.png](provider-dashboard-3.png)
+![model-dashboard-1.png](model-dashboard-1.png)
+![model-dashboard-2.png](model-dashboard-2.png)
+![model-dashboard-3.png](model-dashboard-3.png)
+
+# How It Works
+
+When the SkyWalking Java Agent or OTLP probes intercept calls to mainstream AI 
frameworks (e.g., Spring AI, OpenAI SDK), they report Trace data to the 
SkyWalking OAP.
+The OAP aggregates and computes this data to generate performance metrics for 
both **Providers** and **Models**, which are then rendered in the built-in 
Virtual-GenAI dashboards.
+
+# Installation & Configuration
+
+## Requirements
+* **SkyWalking Java Agent:** >= 9.7
+* **SkyWalking OAP:** >= 10.4
+
+## Semantic Conventions & Compatibility
+SkyWalking Virtual GenAI follows **OpenTelemetry GenAI Semantic Conventions**. 
OAP identifies GenAI-related Spans based on:
+
+### SkyWalking Java Agent
+* Spans must be of type Exit, have the SpanLayer attribute set to GENAI, and 
contain the gen_ai.response.model tag.
+
+### OTLP / Zipkin Probes
+* Spans must contain the `gen_ai.response.model` tag.
+
+For details, refer to the E2E configurations:
+* [SkyWalking Java Agent 
Reporting](https://github.com/apache/skywalking/blob/master/test/e2e-v2/cases/virtual-genai/docker-compose.yml)
+* [Probe Reporting OTLP 
Data](https://github.com/apache/skywalking/blob/master/test/e2e-v2/cases/otlp-virtual-genai/docker-compose.yml)
+* [Probe Reporting Zipkin 
Data](https://github.com/apache/skywalking/blob/master/test/e2e-v2/cases/zipkin-virtual-genai/docker-compose.yml)
+
+---
+
+# GenAI Estimated Cost Configuration
+
+## Overview
+SkyWalking provides a built-in [GenAI Billing Configuration 
File](https://github.com/apache/skywalking/blob/master/oap-server/server-starter/src/main/resources/gen-ai-config.yml).
+
+This file defines how SkyWalking maps model names from Trace data to their 
corresponding providers and estimates the token cost for each LLM call. The 
estimated cost is displayed in the SkyWalking UI alongside trace and metric 
data, helping users intuitively understand the financial impact of their GenAI 
usage.
+
+> **Important:** The pricing in this file is intended for cost estimation only 
and must not be treated as actual billing or invoice amounts. Users are advised 
to regularly verify the latest rates on the providers' official pricing pages.
+
+## Configuration Structure
+
+### Top-level Fields
+| Field | Type | Description |
+| :--- | :--- | :--- |
+| `last-updated` | `date` | The last update date of the pricing data. All 
prices are based on public billing standards announced by providers prior to 
this date. |
+| `providers` | `list` | List of GenAI provider definitions. Each entry 
contains matching rules and specific model pricing information. |
+
+### Provider Definition
+Each entry under `providers` defines a GenAI provider:
+```yaml
+providers:
+- provider: <provider-name>
+  prefix-match:
+    - <prefix-1>
+    - <prefix-2>
+  models:
+    - name: <model-name>
+      aliases: [<alias-1>, <alias-2>]
+      input-estimated-cost-per-m: <cost>
+      output-estimated-cost-per-m: <cost>
+```
+
+| Field | Type | Required | Description |
+| :--- | :--- | :--- | :--- |
+| `provider` | `string` | Yes | The provider identifier (e.g., `openai`, 
`anthropic`, `gemini`). It is displayed as the Virtual GenAI service name in 
SkyWalking. |
+| `prefix-match` | `list[string]` | Yes | A list of prefixes used to match 
model names to this provider. If a model name in the Trace data starts with any 
of these prefixes, it will be mapped to this provider. |
+| `models` | `list[model]` | No | A list of model definitions containing 
pricing information. If omitted, the system can still identify the provider but 
will not perform cost estimation. |
+
+### Model Definition
+Each entry under `models` defines the pricing for a specific model:
+
+| Field | Type | Required | Description |
+| :--- | :--- | :--- | :--- |
+| `name` | `string` | Yes | The standard model name used for matching. |
+| `aliases` | `list[string]` | No | Alternative names that should resolve to 
the same billing entry. This is useful when providers use different naming 
conventions (see the "Model Aliases" section). |
+| `input-estimated-cost-per-m` | `float` | No | Estimated cost per 1,000,000 
(one million) input (Prompt) tokens. The default unit is USD. |
+| `output-estimated-cost-per-m` | `float` | No | Estimated cost per 1,000,000 
(one million) output (Completion) tokens. The default unit is USD. |
+
+## Model Matching Mechanism
+### Provider-Level Prefix Matching
+When SkyWalking receives a Trace containing a GenAI call, it determines the 
**Provider** based on the following priority order:
+
+1. **`gen_ai.provider.name` tag**: This tag is retrieved first. It follows the 
latest `OpenTelemetry` GenAI semantic conventions.
+2. **`gen_ai.system` tag**: If the above tag is missing, the system falls back 
to this legacy tag. Note: This tag is only parsed when processing OTLP or 
Zipkin format data, primarily for compatibility with older versions of 
libraries like the Python auto-instrumentation.
+3. **Prefix Matching**: If neither of the above tags exists, `SkyWalking` 
reads the `prefix-match` rules defined in `gen-ai-config.yml` and attempts to 
identify the provider by matching the **Model Name**.
+
+```yaml
+- provider: openai
+  prefix-match:
+    - gpt
+```
+Any model name starting with **gpt** (such as **gpt-4o**, **gpt-4.1-mini**, or 
**gpt-5-nano**) will be mapped to the **openai** provider.
+A single provider can have multiple prefixes:
+```yaml
+- provider: tencent
+  prefix-match:
+    - hunyuan
+    - Tencent
+```
+### Model-level Longest-Prefix Matching
+Once the provider is determined, SkyWalking uses a Trie-based longest-prefix 
matching algorithm to find the best billing entry. This is crucial because 
model names returned in provider API responses often include version numbers or 
timestamps, differing from the base model name in the config.
+Example OpenAI config:
+```yaml
+models:
+- name: gpt-4o
+  input-estimated-cost-per-m: 2.5
+  output-estimated-cost-per-m: 10.0
+- name: gpt-4o-mini
+  input-estimated-cost-per-m: 0.15
+  output-estimated-cost-per-m: 0.6
+```
+Matching behavior:
+
+| Model Name in Trace | Matched Configuration Entry | Reason |
+| :--- | :--- | :--- |
+| `gpt-4o` | `gpt-4o` | Exact match |
+| `gpt-4o-2024-08-06` | `gpt-4o` | Longest prefix is `gpt-4o` |
+| `gpt-4o-mini` | `gpt-4o-mini` | Exact match (Longer prefix `gpt-4o-mini` 
takes priority over `gpt-4o`) |
+| `gpt-4o-mini-2024-07-18` | `gpt-4o-mini` | Longest prefix is `gpt-4o-mini` |
+
+This mechanism ensures versioned API model names map to the correct pricing 
tier without requiring exact full names in the configuration file.
+
+### Model Aliases
+Some providers use different naming conventions across API responses and 
documentation. For example, Anthropic's model might appear as `claude-4-sonnet` 
or `claude-sonnet-4`. The aliases field supports both formats under a single 
billing entry:
+```yaml
+- name: claude-4-sonnet
+  aliases: [claude-sonnet-4]
+  input-estimated-cost-per-m: 3.0
+  output-estimated-cost-per-m: 15.0
+```
+
+Under this configuration, `claude-4-sonnet` and `claude-sonnet-4` (as well as 
any versioned variants, such as `claude-sonnet-4-20250514`) will resolve to the 
same **billing entry**.  
+**Note:** Aliases also participate in **longest prefix matching**. Therefore, 
`claude-sonnet-4-20250514` will match the alias `claude-sonnet-4`, which in 
turn resolves to the pricing information for `claude-4-sonnet`.
+
+## Custom Configuration
+### Adding a New Provider
+To add a provider that is not included in the default configuration:
+
+```yaml
+providers:
+# ... Existing providers ...
+
+- provider: ollama
+  prefix-match:
+    - mymodel
+  models:
+    - name: mymodel-large
+      input-estimated-cost-per-m: 1.0
+      output-estimated-cost-per-m: 5.0
+    - name: mymodel-small
+      input-estimated-cost-per-m: 0.1
+      output-estimated-cost-per-m: 0.5
+```
+
+For OTLP/Zipkin data, a dedicated estimated tag has been added. You can now 
view the cost of each GenAI call directly on the UI.
+![otlp-estimated-tag](otlp-estimated-tag.png)
+
+#   Main Metrics
+## 1.Provider Level
+
+| Metric ID | Description | Meaning |
+| :--- | :--- | :--- |
+| `gen_ai_provider_cpm` | Calls Per Minute | Requests per minute (Throughput) |
+| `gen_ai_provider_sla` | Success Rate | Request success rate |
+| `gen_ai_provider_resp_time` | Avg Response Time | Average response time |
+| `gen_ai_provider_latency_percentile` | Latency Percentiles | Response time 
percentiles (P50, P75, P90, P95, P99) |
+| `gen_ai_provider_input_tokens_sum/avg` | Input Token Usage | Total and 
average input token usage |
+| `gen_ai_provider_output_tokens_sum/avg` | Output Token Usage | Total and 
average output token usage |
+| `gen_ai_provider_total_estimated_cost/avg` | Estimated Cost | Total 
estimated cost and average cost per call |
+
+## 2. Model Level
+
+| Metric ID | Description | Meaning |
+| :--- | :--- | :--- |
+| `gen_ai_model_call_cpm` | Calls Per Minute | Requests per minute for this 
specific model |
+| `gen_ai_model_sla` | Success Rate | Model-specific request success rate |
+| `gen_ai_model_latency_avg/percentile` | Latency | Average and percentiles of 
model response duration |
+| `gen_ai_model_ttft_avg/percentile` | TTFT | Time to First Token (Streaming 
only) |
+| `gen_ai_model_input_tokens_sum/avg` | Input Token Usage | Detailed input 
token consumption for the model |
+| `gen_ai_model_output_tokens_sum/avg` | Output Token Usage | Detailed output 
token consumption for the model |
+| `gen_ai_model_total_estimated_cost/avg` | Estimated Cost | Estimated total 
cost and average cost for the model |
+
+## Recommended Usage Scenarios
+
+* Performance Evaluation: Use Latency and Time to First Token (TTFT) metrics 
to analyze model inference efficiency and the end-user interaction experience.
+* Token Monitoring: Real-time monitoring of Input and Output token consumption 
to analyze resource utilization across different business scenarios.
+* Cost Alerting: Set alert thresholds based on Estimated Cost or token 
consumption to promptly detect abnormal calls and prevent budget overruns.
diff --git 
a/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-1.png 
b/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-1.png
new file mode 100644
index 00000000000..47402780037
Binary files /dev/null and 
b/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-1.png differ
diff --git 
a/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-2.png 
b/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-2.png
new file mode 100644
index 00000000000..65bca104ced
Binary files /dev/null and 
b/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-2.png differ
diff --git 
a/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-3.png 
b/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-3.png
new file mode 100644
index 00000000000..3fe6b08aa10
Binary files /dev/null and 
b/content/blog/2026-04-05-virtual-genai-monitoring/model-dashboard-3.png differ
diff --git 
a/content/blog/2026-04-05-virtual-genai-monitoring/otlp-estimated-tag.png 
b/content/blog/2026-04-05-virtual-genai-monitoring/otlp-estimated-tag.png
new file mode 100644
index 00000000000..6970f8e3420
Binary files /dev/null and 
b/content/blog/2026-04-05-virtual-genai-monitoring/otlp-estimated-tag.png differ
diff --git 
a/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-1.png 
b/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-1.png
new file mode 100644
index 00000000000..30c37ee74b5
Binary files /dev/null and 
b/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-1.png 
differ
diff --git 
a/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-2.png 
b/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-2.png
new file mode 100644
index 00000000000..65bcd41ce44
Binary files /dev/null and 
b/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-2.png 
differ
diff --git 
a/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-3.png 
b/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-3.png
new file mode 100644
index 00000000000..ac5c374249a
Binary files /dev/null and 
b/content/blog/2026-04-05-virtual-genai-monitoring/provider-dashboard-3.png 
differ
diff --git a/content/zh/2026-04-05-virtual-genai-monitoring/index.md 
b/content/zh/2026-04-05-virtual-genai-monitoring/index.md
new file mode 100644
index 00000000000..128847cb13a
--- /dev/null
+++ b/content/zh/2026-04-05-virtual-genai-monitoring/index.md
@@ -0,0 +1,240 @@
+---
+title: "基于 SkyWalking 10.4 的大模型应用监控：洞察 LLM 的性能与成本"
+author: "邵一鸣"
+date: 2026-04-05
+description: "SkyWalking 10.4 增加了 Virtual GenAI 观测面板,本文将介绍该新特性的使用。"
+tags:
+- GenAI
+- LLM
+- Observability
+- Opentelemetry
+---
+
+# 问题：当应用开始“吞噬”大模型，监控却留下了盲区
+随着生成式 AI（GenAI）在企业业务中的深度渗透，开发者正面临一个尴尬的局面：我们在应用中通过`Spring AI`或`OpenAI 
SDK`快速集成了强大的大模型能力，但对于这些调用的实际表现却几乎一无所知。
+
+
+1. **成本与性能的“黑盒”：昂贵的模型真的更具性价比吗？**  
+面对高昂的大模型账单，我们往往只知道把钱交给了某个`Provider`，却算不清这笔账在应用内部的“投入产出比”。
+盲目的选型升级：为了追求更好的体验，你可能将业务默认切换到了成本更高的旗舰模型。但在具体的业务场景下，花费数倍的 Token 
成本，它真的能在真实请求中带来更低的延迟和更快的 TTFT(Time to First Token) 吗？
+缺乏真实的评估基准：脱离了真实的业务请求，单纯看官网的 Benchmark 意义不大，你需要知道在实际的 Prompt 
长度和并发压力下，同一`Provider`下的哪个模型能在“Token/Cost 
消耗”与“响应速度”之间达到完美的平衡。如果没有应用侧的数据支撑，你根本无从判断哪款模型才是当前业务的最优解。
+
+
+2. **消失的“黄金超时时间”**  
+很多团队在代码里给 LLM 调用设置超时（Timeout）时，往往是拍脑袋决定（比如 30s 或 60s）。  
+设太短：长文本生成或模型高峰期时，请求会被频繁强行中断，导致业务失败率飙升。  
+设太长：如果下游供应商出现故障（卡死），大量的请求会堆积在应用内存中，阻塞执行线程，最终导致整个 Java 应用甚至微服务集群的瘫痪。
+只有真正掌握了预估的整体调用延迟（P99/P95 Latency），你才能基于数据而非直觉，为不同模型设置最合理的超时策略。
+
+
+3. **被忽视的体验杀手：TTFT**  
+在 GenAI 场景下，用户对“快”的感知并不完全取决于整个对话结束的总耗时，而取决于**“第一行字什么时候跳出来”**。
+一个总耗时 10 秒但 TTFT 仅 500ms 的流式响应，给用户的观感是“秒回”。
+一个总耗时 5 秒但 TTFT 需要 4s 的非流式响应，给用户的观感却是“卡死”。
+如果你的观测系统只能看到总耗时，你就会漏掉最核心的 UX 指标，无法解释为什么用户反馈“AI 很慢”即便总耗时看起来还行。
+
+
+**SkyWalking 10.4：应用视角的“数字仪表盘”**  
+Apache SkyWalking 自 10.4 版本引入的 Virtual GenAI 
能力，正是为了解决应用层侧的这种“观测真空”。它不依赖任何外部网关，直接通过应用侧探针（如 Java Agent）在客户端视角采集最真实的数据。  
+- 精准的延迟分布（Latency Percentiles）：通过 P50、P90、P99 等多维指标，帮你勾勒出 LLM 
调用的真实波动曲线，为设置“动态超时时间”提供科学依据。  
+- 核心 UX 指标——TTFT 监控：原生支持流式（Streaming）调用的首字延迟统计。通过对比不同 Provider 或不同模型的 
TTFT，你可以优化提示词（Prompt）策略或切换更快的模型，确保用户体验始终在线。  
+- 多维度的模型“画像”分析：在 Provider 和 Model 两个维度上，将 Token 
消耗、预估成本与性能指标深度对齐。这让你不再看供应商全网的“理想平均数”，而是看清你的应用在调用特定模型时的真实表现，从而在复杂的模型生态中选出最具性价比的选型方案。
+
+# 虚拟 GenAI 观测
+
+> **虚拟 GenAI** 代表了由探针插件检测到的生成式 AI 服务节点。GenAI 操作的性能指标均基于 **GenAI 客户端视角**。
+
+例如，Java 探针中的 **Spring AI 插件**可以检测一次对话补全（Chat Completion）请求的响应延迟。随后，SkyWalking 
将在仪表盘中展示：
+* **流量与成功率** (CPM & SLA)
+* **响应延迟** (Latency & TTFT)
+* **Token 消耗** (Input/Output)
+* **预估成本** (Estimated Cost)
+
+如图：
+![provider-dashboard-1.png](provider-dashboard-1.png)
+![provider-dashboard-2.png](provider-dashboard-2.png)
+![provider-dashboard-3.png](provider-dashboard-3.png)
+![model-dashboard-1.png](model-dashboard-1.png)
+![model-dashboard-2.png](model-dashboard-2.png)
+![model-dashboard-3.png](model-dashboard-3.png)
+
+# 原理
+
+当 SkyWalking Java Agent 或 OTLP 探针拦截到主流 AI 框架（如 Spring AI、OpenAI SDK 
等）的调用时，将Trace 数据上报至 SkyWalking OAP。
+OAP会基于这些 Trace 自动完成数据的聚合与计算。最终会生成 Provider（服务商）与 
Model（模型）两个维度的各类性能指标，并直接渲染填充至内置的 Virtual-GenAI 仪表盘中。
+
+# 安装配置
+## 要求
+### 版本要求
+● SkyWalking Java Agent: >= 9.7
+● SkyWalking Oap: >= 10.4
+
+### 语义规范与兼容性
+SkyWalking 虚拟 GenAI 遵循` OpenTelemetry GenAI` 语义规范。OAP 将根据以下标准识别 GenAI 相关 Span：
+
+#### SkyWalking Java Agent
+* 上报的 Span 必须为 Exit 类型，其 SpanLayer 属性需设定为 GENAI,包含`gen_ai.response.model` 标签。
+
+#### 输出OTLP / Zipkin格式数据的探针
+* 上报的 Span 中包含 `gen_ai.response.model` 标签。
+
+具体可以参考e2e配置  
+[SkyWalking Java 
Agent上报数据](https://github.com/apache/skywalking/blob/master/test/e2e-v2/cases/virtual-genai/docker-compose.yml)
  
+[探针上报OTLP格式数据](https://github.com/apache/skywalking/blob/master/test/e2e-v2/cases/otlp-virtual-genai/docker-compose.yml)
  
+[探针上报Zipkin格式数据](https://github.com/apache/skywalking/blob/master/test/e2e-v2/cases/zipkin-virtual-genai/docker-compose.yml)
  
+
+# GenAI 预估成本配置
+## 概览
+SkyWalking 
提供了一个内置的[GenAI计费配置文件](https://github.com/apache/skywalking/blob/master/oap-server/server-starter/src/main/resources/gen-ai-config.yml)
+
+该配置定义了SkyWalking 如何将 Trace 数据中的模型名称映射到对应的供应商，并估算每次 LLM 调用的 Token 成本。估算成本将与 
Trace 和指标数据一起显示在 SkyWalking UI 中，帮助用户直观了解 GenAI 使用带来的 预估费用影响。
+重要提示: 此文件中的定价仅用于成本估算，不得视为实际账单或发票金额。建议用户定期从供应商官方定价页面核实最新费率。
+
+## 配置结构
+### Top 字段
+
+| 字段 | 类型 | 描述 |
+| :--- | :--- | :--- |
+| `last-updated` | `date` | 定价数据的最后更新日期。所有价格均基于该日期前各厂商官网公布的公开计费标准。 |
+| `providers` | `list` | GenAI 厂商定义列表。每个厂商条目下包含匹配规则（matching 
rules）以及具体的模型计费信息（model pricing）。 |
+
+### provider 定义
+`providers` 下的每个条目定义一个 GenAI 供应商：
+```yaml
+providers:
+- provider: <provider-name>
+  prefix-match:
+    - <prefix-1>
+    - <prefix-2>
+  models:
+    - name: <model-name>
+      aliases: [<alias-1>, <alias-2>]
+      input-estimated-cost-per-m: <cost>
+      output-estimated-cost-per-m: <cost>
+```
+
+| 字段 (Field) | 类型 (Type) | 必填 (Required) | 描述 (Description) |
+| :--- | :--- | :--- | :--- |
+| `provider` | `string` | 是 | 供应商标识（如 `openai`, `anthropic`, `gemini`）。在 
SkyWalking 中作为虚拟 GenAI 服务名显示。 |
+| `prefix-match` | `list[string]` | 是 | 用于将模型名称匹配到该供应商的前缀列表。如果 Trace 
数据中的模型名以其中任一前缀开头，则会被映射到该供应商。 |
+| `models` | `list[model]` | 否 | 包含定价信息的模型定义列表。如果省略，系统仍能识别供应商，但不会进行成本估算。 |
+
+### model 定义
+`models` 下的每个条目定义特定模型的定价：
+
+| 字段 (Field) | 类型 (Type) | 必填 (Required) | 描述 (Description) |
+| :--- | :--- | :--- | :--- |
+| `name` | `string` | 是 | 用于匹配的标准模型名称。 |
+| `aliases` | `list[string]` | 否 | 
应解析为同一计费条目的备选名称。当供应商使用不同的命名习惯时非常有用（参见“模型别名”部分）。 |
+| `input-estimated-cost-per-m` | `float` | 否 | 每 1,000,000（一百万）输入（Prompt）Token 
的预估成本。默认单位为 USD。 |
+| `output-estimated-cost-per-m` | `float` | 否 | 每 
1,000,000（一百万）输出（Completion）Token 的预估成本。默认单位为 USD。 |
+
+## 模型匹配机制
+### 供应商级前缀匹配
+当 SkyWalking 接收到包含 GenAI 调用的 Trace 时，会按照以下优先级顺序来确定供应商（Provider）：
+* `gen_ai.provider.name` 标签：首先检索此标签。它是`OpenTelemetry`最新的语义规范。
+* `gen_ai.system` 标签：如果缺少上述标签，系统将回退到此旧版（Legacy）标签。注意：此标签仅在处理 OTLP 或 Zipkin 
协议的数据时会被解析，主要用于兼容旧版的 Python 自动仪表化等库。
+* 前缀匹配 (Prefix Matching)：若上述两个标签均不存在，`SkyWalking` 会读取 `gen-ai-config.yml` 中定义的 
prefix-match 规则，通过匹配 模型名称 (Model Name) 来尝试识别供应商。
+
+```yaml
+- provider: openai
+  prefix-match:
+    - gpt
+```
+任何以 gpt 开头的模型名称（如 gpt-4o, gpt-4.1-mini, gpt-5-nano）都会被映射到 openai 供应商。
+一个供应商可以拥有多个前缀：
+```yaml
+- provider: tencent
+  prefix-match:
+    - hunyuan
+    - Tencent
+```
+
+### 模型级最长前缀匹配 (Model-Level Longest-Prefix Matching)
+一旦确定了供应商，SkyWalking 会使用基于前缀树 (Trie) 的最长前缀匹配算法来查找最佳的模型计费条目。这至关重要，因为 LLM 供应商在 
API 响应中返回的模型名称通常包含版本号或时间戳，与配置中的基础模型名称有所不同。
+示例： 假设 OpenAI 的配置条目如下：
+```yaml
+models:
+- name: gpt-4o
+  input-estimated-cost-per-m: 2.5
+  output-estimated-cost-per-m: 10.0
+- name: gpt-4o-mini
+  input-estimated-cost-per-m: 0.15
+  output-estimated-cost-per-m: 0.6
+```
+其匹配行为如下表所示：
+
+| Trace 中的模型名称 | 匹配的配置条目 | 原因 |
+| :--- | :--- | :--- |
+| `gpt-4o` | `gpt-4o` | 完全匹配 |
+| `gpt-4o-2024-08-06` | `gpt-4o` | 最长前缀为 `gpt-4o` |
+| `gpt-4o-mini` | `gpt-4o-mini` | 完全匹配（比 `gpt-4o` 更长的前缀优先） |
+| `gpt-4o-mini-2024-07-18` | `gpt-4o-mini` | 最长前缀为 `gpt-4o-mini` |
+
+  
+这种机制确保了 API 返回的带有版本的模型名称能够被正确映射到相应的价格档位，而无需在配置文件中维护精确的全名。
+
+### 模型别名 (Model Aliases)
+部分供应商在 API 响应和官方文档中会使用不同的命名规范。例如，Anthropic 的模型在 Trace 中可能显示为 claude-4-sonnet 或 
claude-sonnet-4。通过 aliases 字段，可以让单个计费条目同时支持这两种配置：
+```yaml
+- name: claude-4-sonnet
+  aliases: [claude-sonnet-4]
+  input-estimated-cost-per-m: 3.0
+  output-estimated-cost-per-m: 15.0
+```
+
+在这种配置下，`claude-4-sonnet` 和 `claude-sonnet-4`（以及任何带有版本的变体，如 
`claude-sonnet-4-20250514`）都会解析为同一个计费条目。  
+**注意**： 别名同样参与最长前缀匹配。因此，`claude-sonnet-4-20250514` 会匹配到别名 
`claude-sonnet-4`，进而解析到 `claude-4-sonnet` 的定价信息。
+
+## 自定义配置
+添加新供应商 (Adding a New Provider)
+要添加默认配置中未包含的供应商：
+```yaml
+providers:
+# ... 现有供应商 ...
+
+- provider: ollama
+  prefix-match:
+    - mymodel
+  models:
+    - name: mymodel-large
+      input-estimated-cost-per-m: 1.0
+      output-estimated-cost-per-m: 5.0
+    - name: mymodel-small
+      input-estimated-cost-per-m: 0.1
+      output-estimated-cost-per-m: 0.5
+```
+
+针对OTLP/zipkin的数据，新增了单独的estimated tag, 可以在UI上看到这次GenAI调用消耗的cost。  
+![otlp-estimated-tag](otlp-estimated-tag.png)
+
+#   主要指标
+## 1. Provider Level (服务商维度)
+
+| 指标 ID | 描述 | 含义 |
+| :--- | :--- | :--- |
+| `gen_ai_provider_cpm` | Calls Per Minute | 每分钟请求数 (吞吐量) |
+| `gen_ai_provider_sla` | Success Rate | 请求成功率 |
+| `gen_ai_provider_resp_time` | Avg Response Time | 平均响应耗时 |
+| `gen_ai_provider_latency_percentile` | Latency Percentiles | 响应耗时百分位数 (P50, 
P75, P90, P95, P99) |
+| `gen_ai_provider_input_tokens_sum/avg` | Input Token Usage | 输入 Token 
的总和及平均值 |
+| `gen_ai_provider_output_tokens_sum/avg` | Output Token Usage | 输出 Token 
的总和及平均值 |
+| `gen_ai_provider_total_estimated_cost/avg` | Estimated Cost | 预估总成本及次均成本 |
+
+## 2. Model Level (模型维度)
+
+| 指标 ID | 描述 | 含义                             |
+| :--- | :--- |:-------------------------------|
+| `gen_ai_model_call_cpm` | Calls Per Minute | 该特定模型的每分钟请求数                   |
+| `gen_ai_model_sla` | Success Rate | 模型请求成功率                        |
+| `gen_ai_model_latency_avg/percentile` | Latency | 模型响应耗时的平均值及百分位数            
    |
+| `gen_ai_model_ttft_avg/percentile` | TTFT | 首个token响应时间 (仅限流式传输 Streaming) |
+| `gen_ai_model_input_tokens_sum/avg` | Input Token Usage | 该模型的输入 Token 消耗详情  
            |
+| `gen_ai_model_output_tokens_sum/avg` | Output Token Usage | 该模型的输出 Token 
消耗详情              |
+| `gen_ai_model_total_estimated_cost/avg` | Estimated Cost | 该模型的预估总成本及次均成本    
             |
+
+## 建议使用场景
+* 性能评估：利用 响应延迟（Latency） 和 首字响应时间（TTFT） 指标，分析模型推理效率及终端用户交互体验。
+* Token 监控：实时监控 输入（Input）与输出（Output）Token 的消耗，用于分析不同业务场景下的资源占用情况。
+* 成本预警：支持基于 预估成本（Cost） 或 Token 消耗量 配置告警阈值，及时发现异常调用，防止成本超支。
+
diff --git 
a/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-1.png 
b/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-1.png
new file mode 100644
index 00000000000..47402780037
Binary files /dev/null and 
b/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-1.png differ
diff --git 
a/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-2.png 
b/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-2.png
new file mode 100644
index 00000000000..65bca104ced
Binary files /dev/null and 
b/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-2.png differ
diff --git 
a/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-3.png 
b/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-3.png
new file mode 100644
index 00000000000..3fe6b08aa10
Binary files /dev/null and 
b/content/zh/2026-04-05-virtual-genai-monitoring/model-dashboard-3.png differ
diff --git 
a/content/zh/2026-04-05-virtual-genai-monitoring/otlp-estimated-tag.png 
b/content/zh/2026-04-05-virtual-genai-monitoring/otlp-estimated-tag.png
new file mode 100644
index 00000000000..6970f8e3420
Binary files /dev/null and 
b/content/zh/2026-04-05-virtual-genai-monitoring/otlp-estimated-tag.png differ
diff --git 
a/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-1.png 
b/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-1.png
new file mode 100644
index 00000000000..30c37ee74b5
Binary files /dev/null and 
b/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-1.png differ
diff --git 
a/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-2.png 
b/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-2.png
new file mode 100644
index 00000000000..65bcd41ce44
Binary files /dev/null and 
b/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-2.png differ
diff --git 
a/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-3.png 
b/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-3.png
new file mode 100644
index 00000000000..ac5c374249a
Binary files /dev/null and 
b/content/zh/2026-04-05-virtual-genai-monitoring/provider-dashboard-3.png differ

(skywalking-website) branch master updated: Add the blog: Virtual GenAI Observability (#828)

Reply via email to