This is an automated email from the ASF dual-hosted git repository. wusheng pushed a commit to branch feature/swip-10-envoy-ai-gateway in repository https://gitbox.apache.org/repos/asf/skywalking.git
commit 3c875d346f56b2259f33ec813fd592c06e92e7de Author: Wu Sheng <[email protected]> AuthorDate: Tue Mar 31 14:28:28 2026 +0800 Add Envoy AI Gateway monitoring documentation Setup guide with OTLP configuration, Kubernetes GatewayConfig example, metric reference tables (service, provider, model, instance), and access log sampling policy. --- .../backend/backend-envoy-ai-gateway-monitoring.md | 104 +++++++++++++++++++++ docs/menu.yml | 2 + .../envoy-ai-gateway-instance.json | 12 +-- .../envoy_ai_gateway/envoy-ai-gateway-root.json | 16 ++++ .../envoy_ai_gateway/envoy-ai-gateway-service.json | 12 +-- 5 files changed, 134 insertions(+), 12 deletions(-) diff --git a/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md new file mode 100644 index 0000000000..a5a3d99cc1 --- /dev/null +++ b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md @@ -0,0 +1,104 @@ +# Envoy AI Gateway Monitoring + +## Envoy AI Gateway observability via OTLP + +[Envoy AI Gateway](https://aigateway.envoyproxy.io/) is a gateway/proxy for AI/LLM API traffic +(OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy. +It natively emits GenAI metrics and access logs via OTLP, following +[OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). + +SkyWalking receives OTLP metrics and logs directly on its gRPC port (11800) — no OpenTelemetry +Collector is needed between the AI Gateway and SkyWalking OAP. + +### Prerequisites +- [Envoy AI Gateway](https://aigateway.envoyproxy.io/) deployed. See the + [Envoy AI Gateway quickstart](https://aigateway.envoyproxy.io/docs/capabilities/quickstart/) for installation. + +### Data flow +1. Envoy AI Gateway processes LLM API requests and records GenAI metrics (token usage, latency, TTFT, TPOT). +2. The AI Gateway pushes metrics and access logs via OTLP gRPC to SkyWalking OAP. +3. SkyWalking OAP parses metrics with [MAL](../../concepts-and-designs/mal.md) rules and access logs + with [LAL](../../concepts-and-designs/lal.md) rules. + +### Set up + +The MAL rules (`envoy-ai-gateway/*`) and LAL rules (`envoy-ai-gateway`) are enabled by default +in SkyWalking OAP. No OAP-side configuration is needed. + +Configure the AI Gateway to push OTLP to SkyWalking by setting these environment variables: + +| Env Var | Value | Purpose | +|---------|-------|---------| +| `OTEL_SERVICE_NAME` | Per-deployment gateway name (e.g., `my-ai-gateway`) | SkyWalking service name | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800` | SkyWalking OAP gRPC receiver | +| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport | +| `OTEL_METRICS_EXPORTER` | `otlp` | Enable OTLP metrics push | +| `OTEL_LOGS_EXPORTER` | `otlp` | Enable OTLP access log push | +| `OTEL_RESOURCE_ATTRIBUTES` | See below | Routing + instance + layer | + +**Required resource attributes** (in `OTEL_RESOURCE_ATTRIBUTES`): +- `job_name=envoy-ai-gateway` — Fixed routing tag for MAL/LAL rules. Same for all AI Gateway deployments. +- `service.instance.id=<instance-id>` — Instance identity. In Kubernetes, use the pod name via Downward API. +- `service.layer=ENVOY_AI_GATEWAY` — Routes access logs to the AI Gateway LAL rules. + +**Example:** +```bash +OTEL_SERVICE_NAME=my-ai-gateway +OTEL_EXPORTER_OTLP_ENDPOINT=http://skywalking-oap:11800 +OTEL_EXPORTER_OTLP_PROTOCOL=grpc +OTEL_METRICS_EXPORTER=otlp +OTEL_LOGS_EXPORTER=otlp +OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=pod-abc123,service.layer=ENVOY_AI_GATEWAY +``` + +### Supported Metrics + +SkyWalking observes the AI Gateway as a `LAYER: ENVOY_AI_GATEWAY` service. Each gateway deployment +is a service, each pod is an instance. Metrics include per-provider and per-model breakdowns. + +#### Service Metrics + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Request CPM | calls/min | meter_envoy_ai_gw_request_cpm | Requests per minute | +| Request Latency Avg | ms | meter_envoy_ai_gw_request_latency_avg | Average request duration | +| Request Latency Percentile | ms | meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99 | +| Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate | Input (prompt) tokens per minute | +| Output Token Rate | tokens/min | meter_envoy_ai_gw_output_token_rate | Output (completion) tokens per minute | +| TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg | Time to First Token (streaming only) | +| TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile | P50/P75/P90/P95/P99 TTFT | +| TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg | Time Per Output Token (streaming only) | +| TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile | P50/P75/P90/P95/P99 TPOT | + +#### Provider Breakdown Metrics + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm | Requests by provider | +| Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate | Token rate by provider | +| Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg | Latency by provider | + +#### Model Breakdown Metrics + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm | Requests by model | +| Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token rate by model | +| Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg | Latency by model | +| Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT by model | +| Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT by model | + +#### Instance Metrics + +All service-level metrics are also available per instance (pod) with `meter_envoy_ai_gw_instance_` prefix, +including per-provider and per-model breakdowns. + +### Access Log Sampling + +The LAL rules apply a sampling policy to reduce storage: +- **Error responses** (HTTP status >= 400) — always persisted. +- **Upstream failures** — always persisted. +- **High token cost** (>= 10,000 total tokens) — persisted for cost anomaly detection. +- Normal successful responses with low token counts are dropped. + +The token threshold can be adjusted in `lal/envoy-ai-gateway.yaml`. diff --git a/docs/menu.yml b/docs/menu.yml index bde793633a..2ca1d7f1ca 100644 --- a/docs/menu.yml +++ b/docs/menu.yml @@ -152,6 +152,8 @@ catalog: catalog: - name: "Virtual GenAI" path: "/en/setup/service-agent/virtual-genai" + - name: "Envoy AI Gateway" + path: "/en/setup/backend/backend-envoy-ai-gateway-monitoring" - name: "Self Observability" catalog: - name: "OAP self telemetry" diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json index 7cc0e375c6..fe314a11c0 100644 --- a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json @@ -165,7 +165,7 @@ } ], "widget": { - "title": "Time to First Token (Avg)", + "title": "Time to First Token Avg (TTFT)", "tips": "Average time to first token for streaming requests" } }, @@ -191,7 +191,7 @@ } ], "widget": { - "title": "TTFT Percentile", + "title": "Time to First Token Percentile (TTFT)", "tips": "P50 / P75 / P90 / P95 / P99" } }, @@ -217,7 +217,7 @@ } ], "widget": { - "title": "Time Per Output Token (Avg)", + "title": "Time Per Output Token Avg (TPOT)", "tips": "Average inter-token latency for streaming requests" } }, @@ -243,7 +243,7 @@ } ], "widget": { - "title": "TPOT Percentile", + "title": "Time Per Output Token Percentile (TPOT)", "tips": "P50 / P75 / P90 / P95 / P99" } } @@ -429,7 +429,7 @@ } ], "widget": { - "title": "TTFT Avg by Model" + "title": "Time to First Token Avg by Model (TTFT)" } }, { @@ -454,7 +454,7 @@ } ], "widget": { - "title": "TPOT Avg by Model" + "title": "Time Per Output Token Avg by Model (TPOT)" } } ] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json index 37796f89b9..d23619c7ea 100644 --- a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json @@ -7,6 +7,22 @@ "x": 0, "y": 0, "w": 24, + "h": 2, + "i": "1", + "type": "Text", + "graph": { + "fontColor": "theme", + "backgroundColor": "theme", + "content": "Observe Envoy AI Gateway via OTLP metrics and access logs", + "fontSize": 14, + "textAlign": "left", + "url": "https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/" + } + }, + { + "x": 0, + "y": 2, + "w": 24, "h": 52, "i": "0", "type": "Widget", diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json index 2bf622c482..e2599eee1b 100644 --- a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json @@ -165,7 +165,7 @@ } ], "widget": { - "title": "Time to First Token (Avg)", + "title": "Time to First Token Avg (TTFT)", "tips": "Average time to first token for streaming requests" } }, @@ -191,7 +191,7 @@ } ], "widget": { - "title": "TTFT Percentile", + "title": "Time to First Token Percentile (TTFT)", "tips": "P50 / P75 / P90 / P95 / P99" } }, @@ -217,7 +217,7 @@ } ], "widget": { - "title": "Time Per Output Token (Avg)", + "title": "Time Per Output Token Avg (TPOT)", "tips": "Average inter-token latency for streaming requests" } }, @@ -243,7 +243,7 @@ } ], "widget": { - "title": "TPOT Percentile", + "title": "Time Per Output Token Percentile (TPOT)", "tips": "P50 / P75 / P90 / P95 / P99" } } @@ -429,7 +429,7 @@ } ], "widget": { - "title": "TTFT Avg by Model" + "title": "Time to First Token Avg by Model (TTFT)" } }, { @@ -454,7 +454,7 @@ } ], "widget": { - "title": "TPOT Avg by Model" + "title": "Time Per Output Token Avg by Model (TPOT)" } } ]
