(skywalking) 04/05: Add Envoy AI Gateway monitoring documentation

wusheng Tue, 31 Mar 2026 00:32:50 -0700

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch feature/swip-10-envoy-ai-gateway
in repository https://gitbox.apache.org/repos/asf/skywalking.git


commit 3c875d346f56b2259f33ec813fd592c06e92e7de
Author: Wu Sheng <[email protected]>
AuthorDate: Tue Mar 31 14:28:28 2026 +0800

    Add Envoy AI Gateway monitoring documentation
    
    Setup guide with OTLP configuration, Kubernetes GatewayConfig example,
    metric reference tables (service, provider, model, instance), and
    access log sampling policy.
---
 .../backend/backend-envoy-ai-gateway-monitoring.md | 104 +++++++++++++++++++++
 docs/menu.yml                                      |   2 +
 .../envoy-ai-gateway-instance.json                 |  12 +--
 .../envoy_ai_gateway/envoy-ai-gateway-root.json    |  16 ++++
 .../envoy_ai_gateway/envoy-ai-gateway-service.json |  12 +--
 5 files changed, 134 insertions(+), 12 deletions(-)

diff --git a/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md 
b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md
new file mode 100644
index 0000000000..a5a3d99cc1
--- /dev/null
+++ b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md
@@ -0,0 +1,104 @@
+# Envoy AI Gateway Monitoring
+
+## Envoy AI Gateway observability via OTLP
+
+[Envoy AI Gateway](https://aigateway.envoyproxy.io/) is a gateway/proxy for 
AI/LLM API traffic
+(OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on 
top of Envoy Proxy.
+It natively emits GenAI metrics and access logs via OTLP, following
+[OpenTelemetry GenAI Semantic 
Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
+
+SkyWalking receives OTLP metrics and logs directly on its gRPC port (11800) — 
no OpenTelemetry
+Collector is needed between the AI Gateway and SkyWalking OAP.
+
+### Prerequisites
+- [Envoy AI Gateway](https://aigateway.envoyproxy.io/) deployed. See the
+  [Envoy AI Gateway 
quickstart](https://aigateway.envoyproxy.io/docs/capabilities/quickstart/) for 
installation.
+
+### Data flow
+1. Envoy AI Gateway processes LLM API requests and records GenAI metrics 
(token usage, latency, TTFT, TPOT).
+2. The AI Gateway pushes metrics and access logs via OTLP gRPC to SkyWalking 
OAP.
+3. SkyWalking OAP parses metrics with [MAL](../../concepts-and-designs/mal.md) 
rules and access logs
+   with [LAL](../../concepts-and-designs/lal.md) rules.
+
+### Set up
+
+The MAL rules (`envoy-ai-gateway/*`) and LAL rules (`envoy-ai-gateway`) are 
enabled by default
+in SkyWalking OAP. No OAP-side configuration is needed.
+
+Configure the AI Gateway to push OTLP to SkyWalking by setting these 
environment variables:
+
+| Env Var | Value | Purpose |
+|---------|-------|---------|
+| `OTEL_SERVICE_NAME` | Per-deployment gateway name (e.g., `my-ai-gateway`) | 
SkyWalking service name |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800` | SkyWalking 
OAP gRPC receiver |
+| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport |
+| `OTEL_METRICS_EXPORTER` | `otlp` | Enable OTLP metrics push |
+| `OTEL_LOGS_EXPORTER` | `otlp` | Enable OTLP access log push |
+| `OTEL_RESOURCE_ATTRIBUTES` | See below | Routing + instance + layer |
+
+**Required resource attributes** (in `OTEL_RESOURCE_ATTRIBUTES`):
+- `job_name=envoy-ai-gateway` — Fixed routing tag for MAL/LAL rules. Same for 
all AI Gateway deployments.
+- `service.instance.id=<instance-id>` — Instance identity. In Kubernetes, use 
the pod name via Downward API.
+- `service.layer=ENVOY_AI_GATEWAY` — Routes access logs to the AI Gateway LAL 
rules.
+
+**Example:**
+```bash
+OTEL_SERVICE_NAME=my-ai-gateway
+OTEL_EXPORTER_OTLP_ENDPOINT=http://skywalking-oap:11800
+OTEL_EXPORTER_OTLP_PROTOCOL=grpc
+OTEL_METRICS_EXPORTER=otlp
+OTEL_LOGS_EXPORTER=otlp
+OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=pod-abc123,service.layer=ENVOY_AI_GATEWAY
+```
+
+### Supported Metrics
+
+SkyWalking observes the AI Gateway as a `LAYER: ENVOY_AI_GATEWAY` service. 
Each gateway deployment
+is a service, each pod is an instance. Metrics include per-provider and 
per-model breakdowns.
+
+#### Service Metrics
+
+| Monitoring Panel | Unit | Metric Name | Description |
+|---|---|---|---|
+| Request CPM | calls/min | meter_envoy_ai_gw_request_cpm | Requests per 
minute |
+| Request Latency Avg | ms | meter_envoy_ai_gw_request_latency_avg | Average 
request duration |
+| Request Latency Percentile | ms | 
meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99 |
+| Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate | Input 
(prompt) tokens per minute |
+| Output Token Rate | tokens/min | meter_envoy_ai_gw_output_token_rate | 
Output (completion) tokens per minute |
+| TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg | Time to First Token (streaming 
only) |
+| TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile | 
P50/P75/P90/P95/P99 TTFT |
+| TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg | Time Per Output Token 
(streaming only) |
+| TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile | 
P50/P75/P90/P95/P99 TPOT |
+
+#### Provider Breakdown Metrics
+
+| Monitoring Panel | Unit | Metric Name | Description |
+|---|---|---|---|
+| Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm | 
Requests by provider |
+| Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate | 
Token rate by provider |
+| Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg | Latency 
by provider |
+
+#### Model Breakdown Metrics
+
+| Monitoring Panel | Unit | Metric Name | Description |
+|---|---|---|---|
+| Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm | 
Requests by model |
+| Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token 
rate by model |
+| Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg | Latency by 
model |
+| Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT by model |
+| Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT by model |
+
+#### Instance Metrics
+
+All service-level metrics are also available per instance (pod) with 
`meter_envoy_ai_gw_instance_` prefix,
+including per-provider and per-model breakdowns.
+
+### Access Log Sampling
+
+The LAL rules apply a sampling policy to reduce storage:
+- **Error responses** (HTTP status >= 400) — always persisted.
+- **Upstream failures** — always persisted.
+- **High token cost** (>= 10,000 total tokens) — persisted for cost anomaly 
detection.
+- Normal successful responses with low token counts are dropped.
+
+The token threshold can be adjusted in `lal/envoy-ai-gateway.yaml`.
diff --git a/docs/menu.yml b/docs/menu.yml
index bde793633a..2ca1d7f1ca 100644
--- a/docs/menu.yml
+++ b/docs/menu.yml
@@ -152,6 +152,8 @@ catalog:
             catalog:
               - name: "Virtual GenAI"
                 path: "/en/setup/service-agent/virtual-genai"
+              - name: "Envoy AI Gateway"
+                path: "/en/setup/backend/backend-envoy-ai-gateway-monitoring"
           - name: "Self Observability"
             catalog:
               - name: "OAP self telemetry"
diff --git 
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
 
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
index 7cc0e375c6..fe314a11c0 100644
--- 
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
+++ 
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json
@@ -165,7 +165,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "Time to First Token (Avg)",
+                    "title": "Time to First Token Avg (TTFT)",
                     "tips": "Average time to first token for streaming 
requests"
                   }
                 },
@@ -191,7 +191,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TTFT Percentile",
+                    "title": "Time to First Token Percentile (TTFT)",
                     "tips": "P50 / P75 / P90 / P95 / P99"
                   }
                 },
@@ -217,7 +217,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "Time Per Output Token (Avg)",
+                    "title": "Time Per Output Token Avg (TPOT)",
                     "tips": "Average inter-token latency for streaming 
requests"
                   }
                 },
@@ -243,7 +243,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TPOT Percentile",
+                    "title": "Time Per Output Token Percentile (TPOT)",
                     "tips": "P50 / P75 / P90 / P95 / P99"
                   }
                 }
@@ -429,7 +429,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TTFT Avg by Model"
+                    "title": "Time to First Token Avg by Model (TTFT)"
                   }
                 },
                 {
@@ -454,7 +454,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TPOT Avg by Model"
+                    "title": "Time Per Output Token Avg by Model (TPOT)"
                   }
                 }
               ]
diff --git 
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json
 
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json
index 37796f89b9..d23619c7ea 100644
--- 
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json
+++ 
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json
@@ -7,6 +7,22 @@
           "x": 0,
           "y": 0,
           "w": 24,
+          "h": 2,
+          "i": "1",
+          "type": "Text",
+          "graph": {
+            "fontColor": "theme",
+            "backgroundColor": "theme",
+            "content": "Observe Envoy AI Gateway via OTLP metrics and access 
logs",
+            "fontSize": 14,
+            "textAlign": "left",
+            "url": 
"https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/";
+          }
+        },
+        {
+          "x": 0,
+          "y": 2,
+          "w": 24,
           "h": 52,
           "i": "0",
           "type": "Widget",
diff --git 
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
 
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
index 2bf622c482..e2599eee1b 100644
--- 
a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
+++ 
b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json
@@ -165,7 +165,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "Time to First Token (Avg)",
+                    "title": "Time to First Token Avg (TTFT)",
                     "tips": "Average time to first token for streaming 
requests"
                   }
                 },
@@ -191,7 +191,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TTFT Percentile",
+                    "title": "Time to First Token Percentile (TTFT)",
                     "tips": "P50 / P75 / P90 / P95 / P99"
                   }
                 },
@@ -217,7 +217,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "Time Per Output Token (Avg)",
+                    "title": "Time Per Output Token Avg (TPOT)",
                     "tips": "Average inter-token latency for streaming 
requests"
                   }
                 },
@@ -243,7 +243,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TPOT Percentile",
+                    "title": "Time Per Output Token Percentile (TPOT)",
                     "tips": "P50 / P75 / P90 / P95 / P99"
                   }
                 }
@@ -429,7 +429,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TTFT Avg by Model"
+                    "title": "Time to First Token Avg by Model (TTFT)"
                   }
                 },
                 {
@@ -454,7 +454,7 @@
                     }
                   ],
                   "widget": {
-                    "title": "TPOT Avg by Model"
+                    "title": "Time Per Output Token Avg by Model (TPOT)"
                   }
                 }
               ]

(skywalking) 04/05: Add Envoy AI Gateway monitoring documentation

Reply via email to