This is an automated email from the ASF dual-hosted git repository.
xuetaoli pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/dubbo-go-pixiu-samples.git
The following commit(s) were added to refs/heads/main by this push:
new 0af1179 feat: add kvcache samples (#123)
0af1179 is described below
commit 0af11792d94776e63d385070f368ca2988c0bd1f
Author: 陈乐樂 <[email protected]>
AuthorDate: Fri Mar 13 11:46:09 2026 +0800
feat: add kvcache samples (#123)
* feat: add kvcache samples
* fix some problems
* fix:fix some bugs and add headers
* feat:add integrate test
* feat: add descriptions to readme
---
README.md | 6 +-
README_CN.md | 5 +-
ai/kvcache/README.md | 12 +
ai/kvcache/README_zh.md | 12 +
ai/kvcache/mock/README.md | 40 +++
ai/kvcache/mock/README_zh.md | 40 +++
ai/kvcache/mock/pixiu/conf.yaml | 97 +++++++
ai/kvcache/mock/request.sh | 58 ++++
ai/kvcache/mock/run.sh | 216 ++++++++++++++
ai/kvcache/mock/server/app/main.go | 465 ++++++++++++++++++++++++++++++
ai/kvcache/mock/server/controller/main.go | 197 +++++++++++++
ai/kvcache/mock/server/engine-a/main.go | 231 +++++++++++++++
ai/kvcache/mock/server/engine-b/main.go | 152 ++++++++++
ai/kvcache/mock/test/pixiu_test.go | 185 ++++++++++++
ai/kvcache/real-engine/README.md | 45 +++
ai/kvcache/real-engine/README_zh.md | 45 +++
ai/kvcache/real-engine/pixiu/conf.yaml | 101 +++++++
ai/kvcache/real-engine/request.sh | 37 +++
ai/kvcache/real-engine/test/pixiu_test.go | 110 +++++++
ai/kvcache/real-engine/verify.sh | 372 ++++++++++++++++++++++++
start_integrate_test.sh | 2 +
21 files changed, 2426 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 57afe11..514f239 100644
--- a/README.md
+++ b/README.md
@@ -79,6 +79,10 @@ This project includes multiple samples covering conversions
such as HTTP to Dubb
* `authserver`: OAuth2 authorization server implementation providing full
authorization code flow with PKCE, JWT token generation, and validation
+* **ai**: AI gateway sample collection
+
+ * `kvcache`: Two-layer KVCache samples. `mock` validates Pixiu + kvcache
filter integration locally (tokenize/lookup/pin/route hint); `real-engine` is
for BYOE integration with real vLLM + LMCache controller.
+
* **xds**: Pixiu integration with xDS
## Other Projects in the Dubbo-Go-Pixiu Ecosystem
@@ -102,4 +106,4 @@ If you’d like to add new examples, please follow these
steps:
## License
-This project is licensed under the [Apache License 2.0](LICENSE).
\ No newline at end of file
+This project is licensed under the [Apache License 2.0](LICENSE).
diff --git a/README_CN.md b/README_CN.md
index 865f9ba..f78ed89 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -73,6 +73,9 @@
- xds:pixiu 集成 xds
+- ai:AI 网关示例集合
+ - ai/kvcache:提供两层 KVCache 示例。`mock` 用于本地快速验证 Pixiu + kvcache filter
链路(tokenize/lookup/pin/route hint);`real-engine` 用于 BYOE 场景,对接真实 vLLM + LMCache
controller。
+
## Dubbo-go-pixiu 生态系统的其他项目
-
**[pixiu-admin](https://github.com/apache/dubbo-go-pixiu/tree/develop/admin)**
Dubbo-go-pixiu Admin 是 dubbo-go-pixiu 网关的综合管理平台。它提供了一个集中的控制面板,用于通过基于 Web 的用户界面和
RESTful API 来配置、监控和管理网关资源。
@@ -89,4 +92,4 @@
## 许可证
-本项目采用 [Apache License 2.0](LICENSE) 开源许可。
\ No newline at end of file
+本项目采用 [Apache License 2.0](LICENSE) 开源许可。
diff --git a/ai/kvcache/README.md b/ai/kvcache/README.md
new file mode 100644
index 0000000..02b88f0
--- /dev/null
+++ b/ai/kvcache/README.md
@@ -0,0 +1,12 @@
+# Pixiu KVCache Samples Index
+
+[中文](./README_zh.md) | English
+
+This directory provides two KVCache samples and follows the HOWTO-style sample
layout (`pixiu/`, `test/`, request scripts).
+
+## Samples
+
+| Sample | English README | Chinese README | Purpose |
+| --- | --- | --- | --- |
+| `mock/` | [`ai/kvcache/mock/README.md`](./mock/README.md) |
[`ai/kvcache/mock/README_zh.md`](./mock/README_zh.md) | Local validation for
tokenize/lookup/pin/route-hint integration. |
+| `real-engine/` |
[`ai/kvcache/real-engine/README.md`](./real-engine/README.md) |
[`ai/kvcache/real-engine/README_zh.md`](./real-engine/README_zh.md) | BYOE
integration with real vLLM + LMCache controller. |
diff --git a/ai/kvcache/README_zh.md b/ai/kvcache/README_zh.md
new file mode 100644
index 0000000..24db4bc
--- /dev/null
+++ b/ai/kvcache/README_zh.md
@@ -0,0 +1,12 @@
+# Pixiu KVCache 示例索引
+
+[English](./README.md) | 中文
+
+本目录提供两层 KVCache 示例,并遵循 HOWTO 约定的 sample 结构(`pixiu/`、`test/`、请求脚本)。
+
+## 示例列表
+
+| 示例 | English README | 中文 README | 目标 |
+| --- | --- | --- | --- |
+| `mock/` | [`ai/kvcache/mock/README.md`](./mock/README.md) |
[`ai/kvcache/mock/README_zh.md`](./mock/README_zh.md) | 本地验证
tokenize/lookup/pin/route hint 链路。 |
+| `real-engine/` |
[`ai/kvcache/real-engine/README.md`](./real-engine/README.md) |
[`ai/kvcache/real-engine/README_zh.md`](./real-engine/README_zh.md) | BYOE 对接真实
vLLM + LMCache controller。 |
diff --git a/ai/kvcache/mock/README.md b/ai/kvcache/mock/README.md
new file mode 100644
index 0000000..2c99fb4
--- /dev/null
+++ b/ai/kvcache/mock/README.md
@@ -0,0 +1,40 @@
+# KVCache Mock Sample
+
+[Back to KVCache Index](../README.md) | English | [中文](./README_zh.md)
+
+## Layout
+
+- `pixiu/conf.yaml`: Pixiu configuration
+- `server/controller/main.go`: mock LMCache controller (`/lookup` `/pin`
`/compress` `/evict`)
+- `server/engine-a/main.go`: mock engine A (`/tokenize` +
`/v1/chat/completions`)
+- `server/engine-b/main.go`: mock engine B (`/v1/chat/completions`)
+- `request.sh`: request script
+- `test/pixiu_test.go`: integration test
+- `run.sh`: one-command startup + validation
+
+## Quick Start (CLI)
+
+1. Start sample services and Pixiu:
+
+```bash
+cd ai/kvcache/mock
+./run.sh
+```
+
+2. Send manual requests:
+
+```bash
+./request.sh
+```
+
+3. Run test case:
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## Verification Targets
+
+- tokenize call works (`engine-a`)
+- lookup/pin calls work (`controller`)
+- second same prompt is routed to preferred endpoint (`mock-llm-b`)
diff --git a/ai/kvcache/mock/README_zh.md b/ai/kvcache/mock/README_zh.md
new file mode 100644
index 0000000..4a8a9ee
--- /dev/null
+++ b/ai/kvcache/mock/README_zh.md
@@ -0,0 +1,40 @@
+# KVCache Mock 示例
+
+[返回 KVCache 索引](../README_zh.md) | [English](./README.md) | 中文
+
+## 目录结构
+
+- `pixiu/conf.yaml`: Pixiu 配置
+- `server/controller/main.go`: mock LMCache controller(`/lookup` `/pin`
`/compress` `/evict`)
+- `server/engine-a/main.go`: mock 引擎 A(`/tokenize` + `/v1/chat/completions`)
+- `server/engine-b/main.go`: mock 引擎 B(`/v1/chat/completions`)
+- `request.sh`: 请求脚本
+- `test/pixiu_test.go`: 集成测试
+- `run.sh`: 一键启动并验收
+
+## 命令行快速开始
+
+1. 启动 sample 服务和 Pixiu:
+
+```bash
+cd ai/kvcache/mock
+./run.sh
+```
+
+2. 手工发请求:
+
+```bash
+./request.sh
+```
+
+3. 运行测试用例:
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## 验证目标
+
+- tokenize 调用生效(`engine-a`)
+- lookup/pin 调用生效(`controller`)
+- 同 prompt 第二次请求路由到 preferred endpoint(`mock-llm-b`)
diff --git a/ai/kvcache/mock/pixiu/conf.yaml b/ai/kvcache/mock/pixiu/conf.yaml
new file mode 100644
index 0000000..7c50c85
--- /dev/null
+++ b/ai/kvcache/mock/pixiu/conf.yaml
@@ -0,0 +1,97 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+---
+static_resources:
+ listeners:
+ - name: "kvcache_mock"
+ protocol_type: "HTTP"
+ address:
+ socket_address:
+ address: "0.0.0.0"
+ port: 18888
+ filter_chains:
+ filters:
+ - name: dgp.filter.httpconnectionmanager
+ config:
+ route_config:
+ routes:
+ - match:
+ prefix: "/v1/chat/completions"
+ route:
+ cluster: "mock_llm"
+ cluster_not_found_response_code: 505
+ http_filters:
+ - name: dgp.filter.ai.kvcache
+ config:
+ enabled: true
+ vllm_endpoint: "http://127.0.0.1:18091"
+ lmcache_endpoint: "http://127.0.0.1:18081"
+ default_model: "mock-model"
+ request_timeout: 3s
+ lookup_routing_timeout: 80ms
+ hot_window: 2m
+ hot_max_records: 100
+ token_cache:
+ enabled: true
+ max_size: 1024
+ ttl: 10m
+ cache_strategy:
+ enable_compression: true
+ enable_pinning: true
+ enable_eviction: true
+ memory_threshold: 0.000001
+ hot_content_threshold: 1
+ load_threshold: 0.000001
+ pin_instance_id: "mock-llm-b"
+ pin_location: "ram-b"
+ compress_instance_id: "mock-llm-b"
+ compress_location: "ram-b"
+ compress_method: "zstd"
+ evict_instance_id: "mock-llm-a"
+ - name: dgp.filter.llm.proxy
+ config:
+ timeout: 30s
+ maxIdleConns: 100
+ maxIdleConnsPerHost: 100
+ maxConnsPerHost: 100
+ scheme: "http"
+ config:
+ idle_timeout: 30s
+ read_timeout: 30s
+ write_timeout: 30s
+
+ clusters:
+ - name: "mock_llm"
+ lb_policy: "round_robin"
+ endpoints:
+ - id: "mock-llm-a"
+ socket_address:
+ address: "127.0.0.1"
+ port: 18091
+ - id: "mock-llm-b"
+ socket_address:
+ address: "127.0.0.1"
+ port: 18092
+
+ shutdown_config:
+ timeout: "10s"
+ step_timeout: "2s"
+ reject_policy: "immediacy"
+
+metric:
+ enable: true
+ prometheus_port: 2222
diff --git a/ai/kvcache/mock/request.sh b/ai/kvcache/mock/request.sh
new file mode 100755
index 0000000..56a45ee
--- /dev/null
+++ b/ai/kvcache/mock/request.sh
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:18888}"
+CONTROLLER_URL="${CONTROLLER_URL:-http://127.0.0.1:18081}"
+ENGINE_A_URL="${ENGINE_A_URL:-http://127.0.0.1:18091}"
+ENGINE_B_URL="${ENGINE_B_URL:-http://127.0.0.1:18092}"
+
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+body='{"model":"mock-model","messages":[{"role":"user","content":"route this
same prompt"}]}'
+
+echo "reset stats"
+curl -fsS -X POST "${CONTROLLER_URL}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_A_URL}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_B_URL}/reset" >/dev/null
+
+echo "request #1"
+curl -fsS -H 'Content-Type: application/json' -X POST
"${PIXIU_URL}/v1/chat/completions" -d "${body}"
+echo ""
+
+sleep 0.4
+echo "request #2"
+curl -fsS -H 'Content-Type: application/json' -X POST
"${PIXIU_URL}/v1/chat/completions" -d "${body}"
+echo ""
+
+sleep 0.8
+echo "controller stats"
+curl -fsS "${CONTROLLER_URL}/stats"
+echo ""
+
+echo "engine-a stats"
+curl -fsS "${ENGINE_A_URL}/stats"
+echo ""
+
+echo "engine-b stats"
+curl -fsS "${ENGINE_B_URL}/stats"
+echo ""
diff --git a/ai/kvcache/mock/run.sh b/ai/kvcache/mock/run.sh
new file mode 100755
index 0000000..745b278
--- /dev/null
+++ b/ai/kvcache/mock/run.sh
@@ -0,0 +1,216 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PIXIU_CONFIG="${PIXIU_CONFIG:-${SCRIPT_DIR}/pixiu/conf.yaml}"
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:18888}"
+LMCACHE_ADMIN="${LMCACHE_ADMIN:-http://127.0.0.1:18081}"
+ENGINE_A_ADMIN="${ENGINE_A_ADMIN:-http://127.0.0.1:18091}"
+ENGINE_B_ADMIN="${ENGINE_B_ADMIN:-http://127.0.0.1:18092}"
+PIXIU_SOURCE="${PIXIU_SOURCE:-$(cd "${SCRIPT_DIR}/../../../.." &&
pwd)/dubbo-go-pixiu}"
+GO_CACHE_DIR="${GO_CACHE_DIR:-/tmp/go-build-cache}"
+GO_MOD_CACHE_DIR="${GO_MOD_CACHE_DIR:-/tmp/go-mod-cache}"
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+WORK_DIR="$(mktemp -d /tmp/kvcache-mock-XXXXXX)"
+CONTROLLER_LOG="${WORK_DIR}/mock_controller.log"
+ENGINE_A_LOG="${WORK_DIR}/mock_engine_a.log"
+ENGINE_B_LOG="${WORK_DIR}/mock_engine_b.log"
+PIXIU_LOG="${WORK_DIR}/pixiu.log"
+
+CONTROLLER_PID=""
+ENGINE_A_PID=""
+ENGINE_B_PID=""
+PIXIU_PID=""
+
+REQ_BODY='{"model":"mock-model","messages":[{"role":"user","content":"please
explain kv cache routing"}]}'
+
+cleanup() {
+ set +e
+ if [[ -n "${PIXIU_PID}" ]] && kill -0 "${PIXIU_PID}" >/dev/null 2>&1; then
+ kill "${PIXIU_PID}" >/dev/null 2>&1
+ wait "${PIXIU_PID}" >/dev/null 2>&1
+ fi
+ if [[ -n "${CONTROLLER_PID}" ]] && kill -0 "${CONTROLLER_PID}" >/dev/null
2>&1; then
+ kill "${CONTROLLER_PID}" >/dev/null 2>&1
+ wait "${CONTROLLER_PID}" >/dev/null 2>&1
+ fi
+ if [[ -n "${ENGINE_A_PID}" ]] && kill -0 "${ENGINE_A_PID}" >/dev/null 2>&1;
then
+ kill "${ENGINE_A_PID}" >/dev/null 2>&1
+ wait "${ENGINE_A_PID}" >/dev/null 2>&1
+ fi
+ if [[ -n "${ENGINE_B_PID}" ]] && kill -0 "${ENGINE_B_PID}" >/dev/null 2>&1;
then
+ kill "${ENGINE_B_PID}" >/dev/null 2>&1
+ wait "${ENGINE_B_PID}" >/dev/null 2>&1
+ fi
+}
+trap cleanup EXIT INT TERM
+
+extract_num() {
+ local json="$1"
+ local key="$2"
+ sed -n "s/.*\"${key}\":\([0-9][0-9]*\).*/\1/p" <<<"${json}"
+}
+
+extract_str() {
+ local json="$1"
+ local key="$2"
+ sed -n "s/.*\"${key}\":\"\([^\"]*\)\".*/\1/p" <<<"${json}"
+}
+
+wait_for_health() {
+ local url="$1"
+ for _ in $(seq 1 80); do
+ if curl -fsS "${url}" >/dev/null 2>&1; then
+ return 0
+ fi
+ sleep 0.2
+ done
+ return 1
+}
+
+wait_for_pixiu() {
+ for _ in $(seq 1 80); do
+ local status
+ status="$(curl -s -o "${WORK_DIR}/kvcache-pixiu-ready.out" -w
'%{http_code}' \
+ -H 'Content-Type: application/json' \
+ -X POST "${PIXIU_URL}/v1/chat/completions" \
+ -d "${REQ_BODY}" || true)"
+ if [[ "${status}" == "200" || "${status}" == "4"* || "${status}" == "5"*
]]; then
+ return 0
+ fi
+ sleep 0.2
+ done
+ return 1
+}
+
+start_mocks() {
+ (
+ cd "${SCRIPT_DIR}"
+ env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run
./server/controller >"${CONTROLLER_LOG}" 2>&1
+ ) &
+ CONTROLLER_PID="$!"
+
+ (
+ cd "${SCRIPT_DIR}"
+ env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run
./server/engine-a >"${ENGINE_A_LOG}" 2>&1
+ ) &
+ ENGINE_A_PID="$!"
+
+ (
+ cd "${SCRIPT_DIR}"
+ env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run
./server/engine-b >"${ENGINE_B_LOG}" 2>&1
+ ) &
+ ENGINE_B_PID="$!"
+
+ wait_for_health "${LMCACHE_ADMIN}/health"
+ wait_for_health "${ENGINE_A_ADMIN}/health"
+ wait_for_health "${ENGINE_B_ADMIN}/health"
+}
+
+start_pixiu() {
+ if command -v pixiu >/dev/null 2>&1; then
+ pixiu gateway start -c "${PIXIU_CONFIG}" >"${PIXIU_LOG}" 2>&1 &
+ PIXIU_PID="$!"
+ elif [[ -d "${PIXIU_SOURCE}/cmd/pixiu" ]]; then
+ (
+ cd "${PIXIU_SOURCE}"
+ env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run
./cmd/pixiu/*.go gateway start -c "${PIXIU_CONFIG}"
+ ) >"${PIXIU_LOG}" 2>&1 &
+ PIXIU_PID="$!"
+ else
+ echo "cannot find pixiu binary or source. set PIXIU_SOURCE or install
pixiu in PATH"
+ return 1
+ fi
+
+ if ! wait_for_pixiu; then
+ echo "pixiu start failed, log: ${PIXIU_LOG}"
+ return 1
+ fi
+}
+
+echo "[1/4] starting mock controller + engines"
+start_mocks
+
+echo "[2/4] starting pixiu"
+start_pixiu
+
+curl -fsS -X POST "${LMCACHE_ADMIN}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_A_ADMIN}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_B_ADMIN}/reset" >/dev/null
+
+echo "[3/4] sending warmup and routed requests"
+resp1="$(curl -fsS -H 'Content-Type: application/json' -X POST
"${PIXIU_URL}/v1/chat/completions" -d "${REQ_BODY}")"
+sleep 0.6
+resp2="$(curl -fsS -H 'Content-Type: application/json' -X POST
"${PIXIU_URL}/v1/chat/completions" -d "${REQ_BODY}")"
+
+sleep 1.0
+controller_stats="$(curl -fsS "${LMCACHE_ADMIN}/stats")"
+engine_a_stats="$(curl -fsS "${ENGINE_A_ADMIN}/stats")"
+engine_b_stats="$(curl -fsS "${ENGINE_B_ADMIN}/stats")"
+
+echo "[4/4] evaluating results"
+second_served_by="$(extract_str "${resp2}" "served_by")"
+tokenize_calls="$(extract_num "${engine_a_stats}" "tokenize_calls")"
+lookup_calls="$(extract_num "${controller_stats}" "lookup_calls")"
+pin_calls="$(extract_num "${controller_stats}" "pin_calls")"
+llm_b_calls="$(extract_num "${engine_b_stats}" "chat_calls")"
+
+fail=0
+if [[ -z "${tokenize_calls}" || "${tokenize_calls}" -lt 1 ]]; then
+ echo "FAIL: tokenize was not called on engine-a"
+ fail=1
+fi
+if [[ -z "${lookup_calls}" || "${lookup_calls}" -lt 2 ]]; then
+ echo "FAIL: lookup should be called at least twice (cache build + route
hint)"
+ fail=1
+fi
+if [[ -z "${pin_calls}" || "${pin_calls}" -lt 1 ]]; then
+ echo "FAIL: pin was not called"
+ fail=1
+fi
+if [[ "${second_served_by}" != "mock-llm-b" ]]; then
+ echo "FAIL: expected routed request served_by=mock-llm-b, got
'${second_served_by}'"
+ fail=1
+fi
+if [[ -z "${llm_b_calls}" || "${llm_b_calls}" -lt 1 ]]; then
+ echo "FAIL: preferred endpoint mock-llm-b received no traffic"
+ fail=1
+fi
+
+echo ""
+echo "response#1: ${resp1}"
+echo "response#2: ${resp2}"
+echo "controller_stats: ${controller_stats}"
+echo "engine_a_stats: ${engine_a_stats}"
+echo "engine_b_stats: ${engine_b_stats}"
+echo "controller log: ${CONTROLLER_LOG}"
+echo "engine-a log: ${ENGINE_A_LOG}"
+echo "engine-b log: ${ENGINE_B_LOG}"
+echo "pixiu log: ${PIXIU_LOG}"
+
+if [[ "${fail}" -ne 0 ]]; then
+ exit 1
+fi
+
+echo "PASS: kvcache mock sample validated with 3 isolated mocks (controller +
two engines)."
diff --git a/ai/kvcache/mock/server/app/main.go
b/ai/kvcache/mock/server/app/main.go
new file mode 100644
index 0000000..957201f
--- /dev/null
+++ b/ai/kvcache/mock/server/app/main.go
@@ -0,0 +1,465 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package main
+
+import (
+ "encoding/json"
+ "fmt"
+ "log"
+ "net/http"
+ "os"
+ "strconv"
+ "strings"
+ "sync"
+ "sync/atomic"
+ "time"
+)
+
+type controllerStats struct {
+ mu sync.Mutex
+
+ lookupCalls int
+ lookupSuccess int
+ lookupFailure int
+ pinCalls int
+ compressCalls int
+ evictCalls int
+}
+
+type engineAStats struct {
+ mu sync.Mutex
+
+ tokenizeCalls int
+ chatCalls int
+}
+
+type engineBStats struct {
+ mu sync.Mutex
+
+ chatCalls int
+}
+
+type lookupRequest struct {
+ Tokens []int `json:"tokens"`
+}
+
+type tokensRequest struct {
+ Tokens []int `json:"tokens"`
+}
+
+type eventResp struct {
+ EventID string `json:"event_id"`
+ NumTokens int `json:"num_tokens"`
+}
+
+type llmRequest struct {
+ Model string `json:"model"`
+}
+
+type llmMessage struct {
+ Role string `json:"role"`
+ Content string `json:"content"`
+}
+
+type llmChoice struct {
+ Index int `json:"index"`
+ Message llmMessage `json:"message"`
+}
+
+type llmResponse struct {
+ ID string `json:"id"`
+ Object string `json:"object"`
+ Model string `json:"model"`
+ ServedBy string `json:"served_by"`
+ Choices []llmChoice `json:"choices"`
+ Usage map[string]int `json:"usage"`
+}
+
+type tokenizeResponse struct {
+ Count int `json:"count"`
+ Tokens []int `json:"tokens"`
+ MaxLen int `json:"max_model_len"`
+}
+
+var globalEventCounter uint64
+
+func main() {
+ controllerAddr := envOrDefault("LMCACHE_ADDR", ":18081")
+ engineAAddr := envOrDefault("LLM_A_ADDR", ":18091")
+ engineBAddr := envOrDefault("LLM_B_ADDR", ":18092")
+ preferredEndpoint := envOrDefault("PREFERRED_ENDPOINT_ID", "mock-llm-b")
+ engineAID := envOrDefault("LLM_A_ID", "mock-llm-a")
+ engineBID := envOrDefault("LLM_B_ID", "mock-llm-b")
+
+ controller := buildControllerMux(preferredEndpoint)
+ engineA := buildEngineAMux(engineAID)
+ engineB := buildEngineBMux(engineBID)
+
+ errCh := make(chan error, 3)
+ go serve("mock-controller", controllerAddr, controller, errCh)
+ go serve("mock-engine-a", engineAAddr, engineA, errCh)
+ go serve("mock-engine-b", engineBAddr, engineB, errCh)
+
+ err := <-errCh
+ log.Fatalf("kvcache mock app exited: %v", err)
+}
+
+func serve(name string, addr string, handler http.Handler, errCh chan<- error)
{
+ srv := &http.Server{
+ Addr: addr,
+ Handler: handler,
+ ReadHeaderTimeout: 3 * time.Second,
+ }
+ log.Printf("[%s] listening on %s", name, addr)
+ if err := srv.ListenAndServe(); err != nil && err !=
http.ErrServerClosed {
+ errCh <- fmt.Errorf("%s failed: %w", name, err)
+ }
+}
+
+func buildControllerMux(preferred string) http.Handler {
+ stats := &controllerStats{}
+ mux := http.NewServeMux()
+
+ mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true,
"component": "mock-controller"})
+ })
+ mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+ stats.mu.Lock()
+ defer stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{
+ "lookup_calls": stats.lookupCalls,
+ "lookup_success": stats.lookupSuccess,
+ "lookup_failure": stats.lookupFailure,
+ "pin_calls": stats.pinCalls,
+ "compress_calls": stats.compressCalls,
+ "evict_calls": stats.evictCalls,
+ "preferred_endpoint": preferred,
+ "timestamp_unix_milli": time.Now().UnixMilli(),
+ })
+ })
+ mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ stats.mu.Lock()
+ stats.lookupCalls = 0
+ stats.lookupSuccess = 0
+ stats.lookupFailure = 0
+ stats.pinCalls = 0
+ stats.compressCalls = 0
+ stats.evictCalls = 0
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+ })
+ mux.HandleFunc("/lookup", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ var req lookupRequest
+ if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+ stats.mu.Lock()
+ stats.lookupFailure++
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusBadRequest,
map[string]string{"error": "invalid request"})
+ return
+ }
+ tokenCount := len(req.Tokens)
+ if tokenCount == 0 {
+ tokenCount = 1
+ }
+ layout := map[string]map[string]any{
+ "mock-llm-a": {"0": "ram-a", "1": max(tokenCount/2, 1)},
+ "mock-llm-b": {"0": "ram-b", "1": tokenCount + 3},
+ }
+ if preferred == "mock-llm-a" {
+ layout["mock-llm-a"]["1"] = tokenCount + 3
+ layout["mock-llm-b"]["1"] = max(tokenCount/2, 1)
+ }
+
+ stats.mu.Lock()
+ stats.lookupCalls++
+ stats.lookupSuccess++
+ stats.mu.Unlock()
+
+ writeJSON(w, http.StatusOK, map[string]any{
+ "event_id": nextEventID("lookup"),
+ "layout_info": layout,
+ })
+ })
+ mux.HandleFunc("/pin", func(w http.ResponseWriter, r *http.Request) {
+ handleTokenEvent(stats, w, r, "pin")
+ })
+ mux.HandleFunc("/compress", func(w http.ResponseWriter, r
*http.Request) {
+ handleTokenEvent(stats, w, r, "compress")
+ })
+ mux.HandleFunc("/evict", func(w http.ResponseWriter, r *http.Request) {
+ handleTokenEvent(stats, w, r, "evict")
+ })
+
+ return mux
+}
+
+func handleTokenEvent(stats *controllerStats, w http.ResponseWriter, r
*http.Request, op string) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ var req tokensRequest
+ if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+ writeJSON(w, http.StatusBadRequest, map[string]string{"error":
"invalid request"})
+ return
+ }
+ stats.mu.Lock()
+ switch op {
+ case "pin":
+ stats.pinCalls++
+ case "compress":
+ stats.compressCalls++
+ case "evict":
+ stats.evictCalls++
+ }
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, eventResp{EventID: nextEventID(op),
NumTokens: len(req.Tokens)})
+}
+
+func buildEngineAMux(engineID string) http.Handler {
+ stats := &engineAStats{}
+ mux := http.NewServeMux()
+
+ mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true,
"engine": engineID, "tokenize_enabled": true})
+ })
+ mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+ stats.mu.Lock()
+ defer stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{
+ "tokenize_calls": stats.tokenizeCalls,
+ "chat_calls": stats.chatCalls,
+ "engine_id": engineID,
+ "timestamp_unix_milli": time.Now().UnixMilli(),
+ })
+ })
+ mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ stats.mu.Lock()
+ stats.tokenizeCalls = 0
+ stats.chatCalls = 0
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+ })
+ mux.HandleFunc("/tokenize", func(w http.ResponseWriter, r
*http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ prompt := extractPrompt(r)
+ tokens := tokenizePrompt(prompt)
+
+ stats.mu.Lock()
+ stats.tokenizeCalls++
+ stats.mu.Unlock()
+
+ writeJSON(w, http.StatusOK, tokenizeResponse{Count:
len(tokens), Tokens: tokens, MaxLen: 8192})
+ })
+ mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r
*http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ var req llmRequest
+ _ = json.NewDecoder(r.Body).Decode(&req)
+ if req.Model == "" {
+ req.Model = "mock-model"
+ }
+ stats.mu.Lock()
+ stats.chatCalls++
+ stats.mu.Unlock()
+
+ resp := llmResponse{
+ ID: nextEventID("chatcmpl"),
+ Object: "chat.completion",
+ Model: req.Model,
+ ServedBy: engineID,
+ Choices: []llmChoice{{
+ Index: 0,
+ Message: llmMessage{
+ Role: "assistant",
+ Content: fmt.Sprintf("mock response
from %s", engineID),
+ },
+ }},
+ Usage: map[string]int{"prompt_tokens": 8,
"completion_tokens": 8, "total_tokens": 16},
+ }
+ writeJSON(w, http.StatusOK, resp)
+ })
+
+ return mux
+}
+
+func buildEngineBMux(engineID string) http.Handler {
+ stats := &engineBStats{}
+ mux := http.NewServeMux()
+
+ mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true,
"engine": engineID, "tokenize_enabled": false})
+ })
+ mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+ stats.mu.Lock()
+ defer stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{
+ "chat_calls": stats.chatCalls,
+ "engine_id": engineID,
+ "timestamp_unix_milli": time.Now().UnixMilli(),
+ })
+ })
+ mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ stats.mu.Lock()
+ stats.chatCalls = 0
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+ })
+ mux.HandleFunc("/tokenize", func(w http.ResponseWriter, _
*http.Request) {
+ writeJSON(w, http.StatusNotFound, map[string]string{"error":
"tokenize not available on this instance"})
+ })
+ mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r
*http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ var req llmRequest
+ _ = json.NewDecoder(r.Body).Decode(&req)
+ if req.Model == "" {
+ req.Model = "mock-model"
+ }
+
+ stats.mu.Lock()
+ stats.chatCalls++
+ stats.mu.Unlock()
+
+ resp := llmResponse{
+ ID: nextEventID("chatcmpl"),
+ Object: "chat.completion",
+ Model: req.Model,
+ ServedBy: engineID,
+ Choices: []llmChoice{{
+ Index: 0,
+ Message: llmMessage{
+ Role: "assistant",
+ Content: fmt.Sprintf("mock response
from %s", engineID),
+ },
+ }},
+ Usage: map[string]int{"prompt_tokens": 8,
"completion_tokens": 8, "total_tokens": 16},
+ }
+ writeJSON(w, http.StatusOK, resp)
+ })
+
+ return mux
+}
+
+func extractPrompt(r *http.Request) string {
+ if r == nil || r.Body == nil {
+ return ""
+ }
+ var payload map[string]any
+ if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
+ return ""
+ }
+ if prompt, ok := payload["prompt"]; ok {
+ switch v := prompt.(type) {
+ case string:
+ return strings.TrimSpace(v)
+ case []any:
+ parts := make([]string, 0, len(v))
+ for _, item := range v {
+ if str, ok := item.(string); ok {
+ parts = append(parts, str)
+ }
+ }
+ return strings.Join(parts, "\n")
+ }
+ }
+ messages, ok := payload["messages"].([]any)
+ if !ok {
+ return ""
+ }
+ parts := make([]string, 0, len(messages))
+ for _, item := range messages {
+ msgMap, ok := item.(map[string]any)
+ if !ok {
+ continue
+ }
+ if content, ok := msgMap["content"].(string); ok {
+ parts = append(parts, content)
+ }
+ }
+ return strings.Join(parts, "\n")
+}
+
+func tokenizePrompt(prompt string) []int {
+ prompt = strings.TrimSpace(prompt)
+ if prompt == "" {
+ return []int{0}
+ }
+ words := strings.Fields(prompt)
+ if len(words) == 0 {
+ return []int{0}
+ }
+ tokens := make([]int, 0, len(words))
+ for idx, word := range words {
+ sum := 0
+ for _, r := range word {
+ sum += int(r)
+ }
+ tokens = append(tokens, (sum%997)+idx+1)
+ }
+ return tokens
+}
+
+func nextEventID(prefix string) string {
+ n := atomic.AddUint64(&globalEventCounter, 1)
+ return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+ w.Header().Set("Content-Type", "application/json")
+ w.WriteHeader(status)
+ _ = json.NewEncoder(w).Encode(payload)
+}
+
+func envOrDefault(key string, fallback string) string {
+ val, ok := os.LookupEnv(key)
+ if !ok || strings.TrimSpace(val) == "" {
+ return fallback
+ }
+ return val
+}
+
+func max(a int, b int) int {
+ if a > b {
+ return a
+ }
+ return b
+}
diff --git a/ai/kvcache/mock/server/controller/main.go
b/ai/kvcache/mock/server/controller/main.go
new file mode 100644
index 0000000..6b90f1c
--- /dev/null
+++ b/ai/kvcache/mock/server/controller/main.go
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package main
+
+import (
+ "encoding/json"
+ "log"
+ "net/http"
+ "os"
+ "strconv"
+ "strings"
+ "sync"
+ "sync/atomic"
+ "time"
+)
+
+type controllerStats struct {
+ mu sync.Mutex
+
+ lookupCalls int
+ lookupSuccess int
+ lookupFailure int
+ pinCalls int
+ compressCalls int
+ evictCalls int
+}
+
+type lookupRequest struct {
+ Tokens []int `json:"tokens"`
+}
+
+type tokensRequest struct {
+ Tokens []int `json:"tokens"`
+}
+
+type eventResp struct {
+ EventID string `json:"event_id"`
+ NumTokens int `json:"num_tokens"`
+}
+
+var eventCounter uint64
+
+func main() {
+ addr := envOrDefault("LMCACHE_ADDR", ":18081")
+ preferred := envOrDefault("PREFERRED_ENDPOINT_ID", "mock-llm-b")
+ stats := &controllerStats{}
+
+ mux := http.NewServeMux()
+ mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true,
"component": "mock-controller"})
+ })
+ mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+ stats.mu.Lock()
+ defer stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{
+ "lookup_calls": stats.lookupCalls,
+ "lookup_success": stats.lookupSuccess,
+ "lookup_failure": stats.lookupFailure,
+ "pin_calls": stats.pinCalls,
+ "compress_calls": stats.compressCalls,
+ "evict_calls": stats.evictCalls,
+ "preferred_endpoint": preferred,
+ "timestamp_unix_milli": time.Now().UnixMilli(),
+ })
+ })
+ mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ stats.mu.Lock()
+ stats.lookupCalls = 0
+ stats.lookupSuccess = 0
+ stats.lookupFailure = 0
+ stats.pinCalls = 0
+ stats.compressCalls = 0
+ stats.evictCalls = 0
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+ })
+ mux.HandleFunc("/lookup", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ var req lookupRequest
+ if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+ stats.mu.Lock()
+ stats.lookupFailure++
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusBadRequest,
map[string]string{"error": "invalid request"})
+ return
+ }
+ tokenCount := len(req.Tokens)
+ if tokenCount == 0 {
+ tokenCount = 1
+ }
+ layout := map[string]map[string]any{
+ "mock-llm-a": {"0": "ram-a", "1": max(tokenCount/2, 1)},
+ "mock-llm-b": {"0": "ram-b", "1": tokenCount + 3},
+ }
+ if preferred == "mock-llm-a" {
+ layout["mock-llm-a"]["1"] = tokenCount + 3
+ layout["mock-llm-b"]["1"] = max(tokenCount/2, 1)
+ }
+
+ stats.mu.Lock()
+ stats.lookupCalls++
+ stats.lookupSuccess++
+ stats.mu.Unlock()
+
+ writeJSON(w, http.StatusOK, map[string]any{
+ "event_id": nextEventID("lookup"),
+ "layout_info": layout,
+ })
+ })
+ mux.HandleFunc("/pin", func(w http.ResponseWriter, r *http.Request) {
+ handleTokenEvent(stats, w, r, "pin")
+ })
+ mux.HandleFunc("/compress", func(w http.ResponseWriter, r
*http.Request) {
+ handleTokenEvent(stats, w, r, "compress")
+ })
+ mux.HandleFunc("/evict", func(w http.ResponseWriter, r *http.Request) {
+ handleTokenEvent(stats, w, r, "evict")
+ })
+
+ srv := &http.Server{Addr: addr, Handler: mux, ReadHeaderTimeout: 3 *
time.Second}
+ log.Printf("[mock-controller] listening on %s", addr)
+ if err := srv.ListenAndServe(); err != nil && err !=
http.ErrServerClosed {
+ log.Fatalf("[mock-controller] server failed: %v", err)
+ }
+}
+
+func handleTokenEvent(stats *controllerStats, w http.ResponseWriter, r
*http.Request, op string) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ var req tokensRequest
+ if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+ writeJSON(w, http.StatusBadRequest, map[string]string{"error":
"invalid request"})
+ return
+ }
+
+ stats.mu.Lock()
+ switch op {
+ case "pin":
+ stats.pinCalls++
+ case "compress":
+ stats.compressCalls++
+ case "evict":
+ stats.evictCalls++
+ }
+ stats.mu.Unlock()
+
+ writeJSON(w, http.StatusOK, eventResp{EventID: nextEventID(op),
NumTokens: len(req.Tokens)})
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+ w.Header().Set("Content-Type", "application/json")
+ w.WriteHeader(status)
+ _ = json.NewEncoder(w).Encode(payload)
+}
+
+func nextEventID(prefix string) string {
+ n := atomic.AddUint64(&eventCounter, 1)
+ return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func envOrDefault(key string, fallback string) string {
+ val, ok := os.LookupEnv(key)
+ if !ok || strings.TrimSpace(val) == "" {
+ return fallback
+ }
+ return val
+}
+
+func max(a int, b int) int {
+ if a > b {
+ return a
+ }
+ return b
+}
diff --git a/ai/kvcache/mock/server/engine-a/main.go
b/ai/kvcache/mock/server/engine-a/main.go
new file mode 100644
index 0000000..bb24e05
--- /dev/null
+++ b/ai/kvcache/mock/server/engine-a/main.go
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package main
+
+import (
+ "encoding/json"
+ "fmt"
+ "log"
+ "net/http"
+ "os"
+ "strconv"
+ "strings"
+ "sync"
+ "sync/atomic"
+ "time"
+)
+
+type engineStats struct {
+ mu sync.Mutex
+
+ tokenizeCalls int
+ chatCalls int
+}
+
+type llmRequest struct {
+ Model string `json:"model"`
+}
+
+type llmResponse struct {
+ ID string `json:"id"`
+ Object string `json:"object"`
+ Model string `json:"model"`
+ ServedBy string `json:"served_by"`
+ Choices []llmChoice `json:"choices"`
+ Usage map[string]int `json:"usage"`
+}
+
+type llmChoice struct {
+ Index int `json:"index"`
+ Message llmMessage `json:"message"`
+}
+
+type llmMessage struct {
+ Role string `json:"role"`
+ Content string `json:"content"`
+}
+
+type tokenizeResponse struct {
+ Count int `json:"count"`
+ Tokens []int `json:"tokens"`
+ MaxLen int `json:"max_model_len"`
+}
+
+var eventCounter uint64
+
+func main() {
+ addr := envOrDefault("LLM_A_ADDR", ":18091")
+ engineID := envOrDefault("LLM_A_ID", "mock-llm-a")
+ stats := &engineStats{}
+
+ mux := http.NewServeMux()
+ mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true,
"engine": engineID, "tokenize_enabled": true})
+ })
+ mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+ stats.mu.Lock()
+ defer stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{
+ "tokenize_calls": stats.tokenizeCalls,
+ "chat_calls": stats.chatCalls,
+ "engine_id": engineID,
+ "timestamp_unix_milli": time.Now().UnixMilli(),
+ })
+ })
+ mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ stats.mu.Lock()
+ stats.tokenizeCalls = 0
+ stats.chatCalls = 0
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+ })
+ mux.HandleFunc("/tokenize", func(w http.ResponseWriter, r
*http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ prompt := extractPrompt(r)
+ tokens := tokenizePrompt(prompt)
+
+ stats.mu.Lock()
+ stats.tokenizeCalls++
+ stats.mu.Unlock()
+
+ writeJSON(w, http.StatusOK, tokenizeResponse{Count:
len(tokens), Tokens: tokens, MaxLen: 8192})
+ })
+ mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r
*http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+
+ var req llmRequest
+ _ = json.NewDecoder(r.Body).Decode(&req)
+ if req.Model == "" {
+ req.Model = "mock-model"
+ }
+
+ stats.mu.Lock()
+ stats.chatCalls++
+ stats.mu.Unlock()
+
+ resp := llmResponse{
+ ID: nextEventID("chatcmpl"),
+ Object: "chat.completion",
+ Model: req.Model,
+ ServedBy: engineID,
+ Choices: []llmChoice{{
+ Index: 0,
+ Message: llmMessage{
+ Role: "assistant",
+ Content: fmt.Sprintf("mock response
from %s", engineID),
+ },
+ }},
+ Usage: map[string]int{"prompt_tokens": 8,
"completion_tokens": 8, "total_tokens": 16},
+ }
+ writeJSON(w, http.StatusOK, resp)
+ })
+
+ srv := &http.Server{Addr: addr, Handler: mux, ReadHeaderTimeout: 3 *
time.Second}
+ log.Printf("[mock-engine-a] listening on %s", addr)
+ if err := srv.ListenAndServe(); err != nil && err !=
http.ErrServerClosed {
+ log.Fatalf("[mock-engine-a] server failed: %v", err)
+ }
+}
+
+func extractPrompt(r *http.Request) string {
+ if r == nil || r.Body == nil {
+ return ""
+ }
+ var payload map[string]any
+ if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
+ return ""
+ }
+ if prompt, ok := payload["prompt"]; ok {
+ switch v := prompt.(type) {
+ case string:
+ return strings.TrimSpace(v)
+ case []any:
+ parts := make([]string, 0, len(v))
+ for _, item := range v {
+ if str, ok := item.(string); ok {
+ parts = append(parts, str)
+ }
+ }
+ return strings.Join(parts, "\n")
+ }
+ }
+ messages, ok := payload["messages"].([]any)
+ if !ok {
+ return ""
+ }
+ parts := make([]string, 0, len(messages))
+ for _, item := range messages {
+ msgMap, ok := item.(map[string]any)
+ if !ok {
+ continue
+ }
+ if content, ok := msgMap["content"].(string); ok {
+ parts = append(parts, content)
+ }
+ }
+ return strings.Join(parts, "\n")
+}
+
+func tokenizePrompt(prompt string) []int {
+ prompt = strings.TrimSpace(prompt)
+ if prompt == "" {
+ return []int{0}
+ }
+ words := strings.Fields(prompt)
+ if len(words) == 0 {
+ return []int{0}
+ }
+ tokens := make([]int, 0, len(words))
+ for idx, word := range words {
+ sum := 0
+ for _, r := range word {
+ sum += int(r)
+ }
+ tokens = append(tokens, (sum%997)+idx+1)
+ }
+ return tokens
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+ w.Header().Set("Content-Type", "application/json")
+ w.WriteHeader(status)
+ _ = json.NewEncoder(w).Encode(payload)
+}
+
+func nextEventID(prefix string) string {
+ n := atomic.AddUint64(&eventCounter, 1)
+ return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func envOrDefault(key string, fallback string) string {
+ val, ok := os.LookupEnv(key)
+ if !ok || strings.TrimSpace(val) == "" {
+ return fallback
+ }
+ return val
+}
diff --git a/ai/kvcache/mock/server/engine-b/main.go
b/ai/kvcache/mock/server/engine-b/main.go
new file mode 100644
index 0000000..c5ebcab
--- /dev/null
+++ b/ai/kvcache/mock/server/engine-b/main.go
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package main
+
+import (
+ "encoding/json"
+ "fmt"
+ "log"
+ "net/http"
+ "os"
+ "strconv"
+ "strings"
+ "sync"
+ "sync/atomic"
+ "time"
+)
+
+type engineStats struct {
+ mu sync.Mutex
+
+ chatCalls int
+}
+
+type llmRequest struct {
+ Model string `json:"model"`
+}
+
+type llmResponse struct {
+ ID string `json:"id"`
+ Object string `json:"object"`
+ Model string `json:"model"`
+ ServedBy string `json:"served_by"`
+ Choices []llmChoice `json:"choices"`
+ Usage map[string]int `json:"usage"`
+}
+
+type llmChoice struct {
+ Index int `json:"index"`
+ Message llmMessage `json:"message"`
+}
+
+type llmMessage struct {
+ Role string `json:"role"`
+ Content string `json:"content"`
+}
+
+var eventCounter uint64
+
+func main() {
+ addr := envOrDefault("LLM_B_ADDR", ":18092")
+ engineID := envOrDefault("LLM_B_ID", "mock-llm-b")
+ stats := &engineStats{}
+
+ mux := http.NewServeMux()
+ mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true,
"engine": engineID, "tokenize_enabled": false})
+ })
+ mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+ stats.mu.Lock()
+ defer stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{
+ "chat_calls": stats.chatCalls,
+ "engine_id": engineID,
+ "timestamp_unix_milli": time.Now().UnixMilli(),
+ })
+ })
+ mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+ stats.mu.Lock()
+ stats.chatCalls = 0
+ stats.mu.Unlock()
+ writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+ })
+ mux.HandleFunc("/tokenize", func(w http.ResponseWriter, _
*http.Request) {
+ writeJSON(w, http.StatusNotFound, map[string]string{"error":
"tokenize not available on this instance"})
+ })
+ mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r
*http.Request) {
+ if r.Method != http.MethodPost {
+ writeJSON(w, http.StatusMethodNotAllowed,
map[string]string{"error": "method not allowed"})
+ return
+ }
+
+ var req llmRequest
+ _ = json.NewDecoder(r.Body).Decode(&req)
+ if req.Model == "" {
+ req.Model = "mock-model"
+ }
+
+ stats.mu.Lock()
+ stats.chatCalls++
+ stats.mu.Unlock()
+
+ resp := llmResponse{
+ ID: nextEventID("chatcmpl"),
+ Object: "chat.completion",
+ Model: req.Model,
+ ServedBy: engineID,
+ Choices: []llmChoice{{
+ Index: 0,
+ Message: llmMessage{
+ Role: "assistant",
+ Content: fmt.Sprintf("mock response
from %s", engineID),
+ },
+ }},
+ Usage: map[string]int{"prompt_tokens": 8,
"completion_tokens": 8, "total_tokens": 16},
+ }
+ writeJSON(w, http.StatusOK, resp)
+ })
+
+ srv := &http.Server{Addr: addr, Handler: mux, ReadHeaderTimeout: 3 *
time.Second}
+ log.Printf("[mock-engine-b] listening on %s", addr)
+ if err := srv.ListenAndServe(); err != nil && err !=
http.ErrServerClosed {
+ log.Fatalf("[mock-engine-b] server failed: %v", err)
+ }
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+ w.Header().Set("Content-Type", "application/json")
+ w.WriteHeader(status)
+ _ = json.NewEncoder(w).Encode(payload)
+}
+
+func nextEventID(prefix string) string {
+ n := atomic.AddUint64(&eventCounter, 1)
+ return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func envOrDefault(key string, fallback string) string {
+ val, ok := os.LookupEnv(key)
+ if !ok || strings.TrimSpace(val) == "" {
+ return fallback
+ }
+ return val
+}
diff --git a/ai/kvcache/mock/test/pixiu_test.go
b/ai/kvcache/mock/test/pixiu_test.go
new file mode 100644
index 0000000..8bfba6a
--- /dev/null
+++ b/ai/kvcache/mock/test/pixiu_test.go
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package test
+
+import (
+ "bytes"
+ "encoding/json"
+ "io"
+ "net/http"
+ "os"
+ "testing"
+ "time"
+)
+
+const (
+ defaultPixiuURL = "http://127.0.0.1:18888"
+ defaultControllerURL = "http://127.0.0.1:18081"
+ defaultEngineAURL = "http://127.0.0.1:18091"
+ defaultEngineBURL = "http://127.0.0.1:18092"
+)
+
+var testHTTPClient = &http.Client{Timeout: 3 * time.Second}
+
+func getEnvOrDefault(key string, fallback string) string {
+ if v := os.Getenv(key); v != "" {
+ return v
+ }
+ return fallback
+}
+
+func checkServiceAvailable(url string) bool {
+ resp, err := testHTTPClient.Get(url)
+ if err != nil {
+ return false
+ }
+ defer resp.Body.Close()
+ return resp.StatusCode >= 200 && resp.StatusCode < 300
+}
+
+func checkPixiuAvailable(url string) bool {
+ payload := map[string]any{
+ "model": "mock-model",
+ "prompt": "kvcache availability probe",
+ }
+ data, err := json.Marshal(payload)
+ if err != nil {
+ return false
+ }
+ req, err := http.NewRequest(http.MethodPost,
url+"/v1/chat/completions", bytes.NewReader(data))
+ if err != nil {
+ return false
+ }
+ req.Header.Set("Content-Type", "application/json")
+ resp, err := testHTTPClient.Do(req)
+ if err != nil {
+ return false
+ }
+ defer resp.Body.Close()
+ return resp.StatusCode >= 200 && resp.StatusCode < 600
+}
+
+func postJSON(t *testing.T, url string, payload any) map[string]any {
+ t.Helper()
+ data, err := json.Marshal(payload)
+ if err != nil {
+ t.Fatalf("marshal payload failed: %v", err)
+ }
+
+ req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data))
+ if err != nil {
+ t.Fatalf("create request failed: %v", err)
+ }
+ req.Header.Set("Content-Type", "application/json")
+
+ resp, err := testHTTPClient.Do(req)
+ if err != nil {
+ t.Fatalf("request failed: %v", err)
+ }
+ defer resp.Body.Close()
+ body, _ := io.ReadAll(resp.Body)
+ if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+ t.Fatalf("request status %d: %s", resp.StatusCode, string(body))
+ }
+
+ var out map[string]any
+ if err := json.Unmarshal(body, &out); err != nil {
+ t.Fatalf("unmarshal response failed: %v body=%s", err,
string(body))
+ }
+ return out
+}
+
+func getJSON(t *testing.T, url string) map[string]any {
+ t.Helper()
+ resp, err := testHTTPClient.Get(url)
+ if err != nil {
+ t.Fatalf("get request failed: %v", err)
+ }
+ defer resp.Body.Close()
+ body, _ := io.ReadAll(resp.Body)
+ if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+ t.Fatalf("get status %d: %s", resp.StatusCode, string(body))
+ }
+
+ var out map[string]any
+ if err := json.Unmarshal(body, &out); err != nil {
+ t.Fatalf("unmarshal response failed: %v body=%s", err,
string(body))
+ }
+ return out
+}
+
+func toInt(t *testing.T, value any, key string) int {
+ t.Helper()
+ num, ok := value.(float64)
+ if !ok {
+ t.Fatalf("%s is not numeric: %#v", key, value)
+ }
+ return int(num)
+}
+
+func TestKVCacheMockRoutingFlow(t *testing.T) {
+ pixiuURL := getEnvOrDefault("PIXIU_URL", defaultPixiuURL)
+ controllerURL := getEnvOrDefault("CONTROLLER_URL", defaultControllerURL)
+ engineAURL := getEnvOrDefault("ENGINE_A_URL", defaultEngineAURL)
+ engineBURL := getEnvOrDefault("ENGINE_B_URL", defaultEngineBURL)
+
+ if !checkServiceAvailable(controllerURL+"/health") ||
+ !checkServiceAvailable(engineAURL+"/health") ||
+ !checkServiceAvailable(engineBURL+"/health") ||
+ !checkPixiuAvailable(pixiuURL) {
+ t.Skip("required services are unavailable; start mock servers
and pixiu first")
+ }
+
+ postJSON(t, controllerURL+"/reset", map[string]any{})
+ postJSON(t, engineAURL+"/reset", map[string]any{})
+ postJSON(t, engineBURL+"/reset", map[string]any{})
+
+ payload := map[string]any{
+ "model": "mock-model",
+ "messages": []map[string]any{{
+ "role": "user",
+ "content": "please route same prompt for kvcache test",
+ }},
+ }
+
+ postJSON(t, pixiuURL+"/v1/chat/completions", payload)
+ time.Sleep(600 * time.Millisecond)
+ second := postJSON(t, pixiuURL+"/v1/chat/completions", payload)
+
+ servedBy, _ := second["served_by"].(string)
+ if servedBy != "mock-llm-b" {
+ t.Fatalf("expected served_by mock-llm-b, got %q", servedBy)
+ }
+
+ time.Sleep(1 * time.Second)
+ controllerStats := getJSON(t, controllerURL+"/stats")
+ engineAStats := getJSON(t, engineAURL+"/stats")
+ engineBStats := getJSON(t, engineBURL+"/stats")
+
+ if got := toInt(t, engineAStats["tokenize_calls"], "tokenize_calls");
got < 1 {
+ t.Fatalf("expected tokenize_calls >= 1, got %d", got)
+ }
+ if got := toInt(t, controllerStats["lookup_calls"], "lookup_calls");
got < 2 {
+ t.Fatalf("expected lookup_calls >= 2, got %d", got)
+ }
+ if got := toInt(t, controllerStats["pin_calls"], "pin_calls"); got < 1 {
+ t.Fatalf("expected pin_calls >= 1, got %d", got)
+ }
+ if got := toInt(t, engineBStats["chat_calls"], "chat_calls"); got < 1 {
+ t.Fatalf("expected engine-b chat_calls >= 1, got %d", got)
+ }
+}
diff --git a/ai/kvcache/real-engine/README.md b/ai/kvcache/real-engine/README.md
new file mode 100644
index 0000000..da15ac4
--- /dev/null
+++ b/ai/kvcache/real-engine/README.md
@@ -0,0 +1,45 @@
+# KVCache BYOE (Real Engine) Sample
+
+[Back to KVCache Index](../README.md) | English | [中文](./README_zh.md)
+
+## Layout
+
+- `pixiu/conf.yaml`: BYOE Pixiu template (env-driven)
+- `request.sh`: manual request script
+- `test/pixiu_test.go`: smoke test (requires external BYOE env)
+- `verify.sh`: verification script for metrics-oriented validation
+
+## Prerequisites
+
+- reachable `VLLM_ENDPOINT` with `/tokenize`
+- reachable `LMCACHE_ENDPOINT` with `/lookup` `/pin` `/compress` `/evict`
+- Pixiu binary or Pixiu source repo
+
+## Quick Start (CLI)
+
+1. Configure env and run verification:
+
+```bash
+cd ai/kvcache/real-engine
+export VLLM_ENDPOINT="http://<vllm-host>:<port>"
+export LMCACHE_ENDPOINT="http://<lmcache-host>:<port>"
+./verify.sh
+```
+
+2. Send manual request:
+
+```bash
+./request.sh
+```
+
+3. Run test case:
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## Acceptance Metrics
+
+- lookup success rate
+- preferred endpoint hit rate
+- p95 latency comparison (without cache vs with cache)
diff --git a/ai/kvcache/real-engine/README_zh.md
b/ai/kvcache/real-engine/README_zh.md
new file mode 100644
index 0000000..8d82d3f
--- /dev/null
+++ b/ai/kvcache/real-engine/README_zh.md
@@ -0,0 +1,45 @@
+# KVCache BYOE(真实引擎)示例
+
+[返回 KVCache 索引](../README_zh.md) | [English](./README.md) | 中文
+
+## 目录结构
+
+- `pixiu/conf.yaml`: BYOE Pixiu 模板(环境变量驱动)
+- `request.sh`: 手工请求脚本
+- `test/pixiu_test.go`: 冒烟测试(依赖外部 BYOE 环境)
+- `verify.sh`: 指标验证脚本
+
+## 前置条件
+
+- 可访问的 `VLLM_ENDPOINT`(包含 `/tokenize`)
+- 可访问的 `LMCACHE_ENDPOINT`(包含 `/lookup` `/pin` `/compress` `/evict`)
+- 可用 Pixiu 二进制或 Pixiu 源码仓库
+
+## 命令行快速开始
+
+1. 配置环境变量并执行验证:
+
+```bash
+cd ai/kvcache/real-engine
+export VLLM_ENDPOINT="http://<vllm-host>:<port>"
+export LMCACHE_ENDPOINT="http://<lmcache-host>:<port>"
+./verify.sh
+```
+
+2. 手工发请求:
+
+```bash
+./request.sh
+```
+
+3. 运行测试用例:
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## 验收指标
+
+- lookup 成功率
+- preferred endpoint 命中率
+- p95 延迟对比(无缓存 vs 有缓存)
diff --git a/ai/kvcache/real-engine/pixiu/conf.yaml
b/ai/kvcache/real-engine/pixiu/conf.yaml
new file mode 100644
index 0000000..4c5e56c
--- /dev/null
+++ b/ai/kvcache/real-engine/pixiu/conf.yaml
@@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+---
+# This file is intended to be rendered by verify.sh (envsubst).
+static_resources:
+ listeners:
+ - name: "kvcache_real_engine"
+ protocol_type: "HTTP"
+ address:
+ socket_address:
+ address: "0.0.0.0"
+ port: ${PIXIU_LISTEN_PORT}
+ filter_chains:
+ filters:
+ - name: dgp.filter.httpconnectionmanager
+ config:
+ route_config:
+ routes:
+ - match:
+ prefix: "/v1/chat/completions"
+ route:
+ cluster: "real_llm"
+ cluster_not_found_response_code: 505
+ http_filters:
+ - name: dgp.filter.ai.kvcache
+ config:
+ enabled: true
+ vllm_endpoint: "${VLLM_ENDPOINT}"
+ lmcache_endpoint: "${LMCACHE_ENDPOINT}"
+ default_model: "${MODEL_NAME}"
+ request_timeout: 5s
+ lookup_routing_timeout: 150ms
+ hot_window: 5m
+ hot_max_records: 500
+ token_cache:
+ enabled: true
+ max_size: 20000
+ ttl: 20m
+ cache_strategy:
+ enable_compression: true
+ enable_pinning: true
+ enable_eviction: true
+ memory_threshold: ${MEMORY_THRESHOLD}
+ hot_content_threshold: ${HOT_CONTENT_THRESHOLD}
+ load_threshold: ${LOAD_THRESHOLD}
+ pin_instance_id: "${PIN_INSTANCE_ID}"
+ pin_location: "${PIN_LOCATION}"
+ compress_instance_id: "${COMPRESS_INSTANCE_ID}"
+ compress_location: "${COMPRESS_LOCATION}"
+ compress_method: "zstd"
+ evict_instance_id: "${EVICT_INSTANCE_ID}"
+ - name: dgp.filter.llm.proxy
+ config:
+ timeout: 90s
+ maxIdleConns: 200
+ maxIdleConnsPerHost: 200
+ maxConnsPerHost: 200
+ scheme: "${LLM_SCHEME}"
+ config:
+ idle_timeout: 120s
+ read_timeout: 120s
+ write_timeout: 120s
+
+ clusters:
+ - name: "real_llm"
+ lb_policy: "round_robin"
+ endpoints:
+ - id: "${ENGINE_ENDPOINT_A_ID}"
+ socket_address:
+ address: "${ENGINE_ENDPOINT_A_HOST}"
+ port: ${ENGINE_ENDPOINT_A_PORT}
+ - id: "${ENGINE_ENDPOINT_B_ID}"
+ socket_address:
+ address: "${ENGINE_ENDPOINT_B_HOST}"
+ port: ${ENGINE_ENDPOINT_B_PORT}
+
+ shutdown_config:
+ timeout: "30s"
+ step_timeout: "5s"
+ reject_policy: "immediacy"
+
+metric:
+ enable: true
+ prometheus_port: ${PIXIU_PROM_PORT}
diff --git a/ai/kvcache/real-engine/request.sh
b/ai/kvcache/real-engine/request.sh
new file mode 100755
index 0000000..6b827c8
--- /dev/null
+++ b/ai/kvcache/real-engine/request.sh
@@ -0,0 +1,37 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:18889}"
+MODEL_NAME="${MODEL_NAME:-Qwen2.5-3B-Instruct}"
+
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+body=$(cat <<JSON
+{"model":"${MODEL_NAME}","messages":[{"role":"user","content":"kvcache byoe
verification request"}]}
+JSON
+)
+
+curl -sS -H 'Content-Type: application/json' \
+ -X POST "${PIXIU_URL}/v1/chat/completions" \
+ -d "${body}"
+echo ""
diff --git a/ai/kvcache/real-engine/test/pixiu_test.go
b/ai/kvcache/real-engine/test/pixiu_test.go
new file mode 100644
index 0000000..8b794b8
--- /dev/null
+++ b/ai/kvcache/real-engine/test/pixiu_test.go
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test
+
+import (
+ "bytes"
+ "encoding/json"
+ "io"
+ "net/http"
+ "os"
+ "testing"
+ "time"
+)
+
+const defaultBYOEPixiuURL = "http://127.0.0.1:18889"
+
+var testHTTPClient = &http.Client{Timeout: 3 * time.Second}
+
+func envOrDefault(key string, fallback string) string {
+ if v := os.Getenv(key); v != "" {
+ return v
+ }
+ return fallback
+}
+
+func checkPixiuAvailable(url string) bool {
+ payload := map[string]any{
+ "model": envOrDefault("MODEL_NAME", "Qwen2.5-3B-Instruct"),
+ "prompt": "kvcache byoe availability probe",
+ }
+ data, err := json.Marshal(payload)
+ if err != nil {
+ return false
+ }
+ req, err := http.NewRequest(http.MethodPost,
url+"/v1/chat/completions", bytes.NewReader(data))
+ if err != nil {
+ return false
+ }
+ req.Header.Set("Content-Type", "application/json")
+ resp, err := testHTTPClient.Do(req)
+ if err != nil {
+ return false
+ }
+ defer resp.Body.Close()
+ return resp.StatusCode >= 200 && resp.StatusCode < 600
+}
+
+func TestBYOEEnvironmentAndGatewayAvailability(t *testing.T) {
+ if os.Getenv("VLLM_ENDPOINT") == "" || os.Getenv("LMCACHE_ENDPOINT") ==
"" {
+ t.Skip("VLLM_ENDPOINT/LMCACHE_ENDPOINT not set; BYOE
environment not configured")
+ }
+
+ pixiuURL := envOrDefault("PIXIU_URL", defaultBYOEPixiuURL)
+ if !checkPixiuAvailable(pixiuURL) {
+ t.Skip("pixiu gateway is unavailable; start pixiu with
ai/kvcache/real-engine/pixiu/conf.yaml first")
+ }
+}
+
+func TestBYOERequestSmoke(t *testing.T) {
+ if os.Getenv("VLLM_ENDPOINT") == "" || os.Getenv("LMCACHE_ENDPOINT") ==
"" {
+ t.Skip("VLLM_ENDPOINT/LMCACHE_ENDPOINT not set; BYOE
environment not configured")
+ }
+
+ pixiuURL := envOrDefault("PIXIU_URL", defaultBYOEPixiuURL)
+ if !checkPixiuAvailable(pixiuURL) {
+ t.Skip("pixiu gateway is unavailable; start pixiu first")
+ }
+
+ payload := map[string]any{
+ "model": envOrDefault("MODEL_NAME", "Qwen2.5-3B-Instruct"),
+ "messages": []map[string]any{{
+ "role": "user",
+ "content": "kvcache byoe smoke test",
+ }},
+ }
+ data, err := json.Marshal(payload)
+ if err != nil {
+ t.Fatalf("marshal payload failed: %v", err)
+ }
+ req, err := http.NewRequest(http.MethodPost,
pixiuURL+"/v1/chat/completions", bytes.NewReader(data))
+ if err != nil {
+ t.Fatalf("create request failed: %v", err)
+ }
+ req.Header.Set("Content-Type", "application/json")
+
+ resp, err := testHTTPClient.Do(req)
+ if err != nil {
+ t.Fatalf("request failed: %v", err)
+ }
+ defer resp.Body.Close()
+ body, _ := io.ReadAll(resp.Body)
+ if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+ t.Fatalf("unexpected status %d: %s", resp.StatusCode,
string(body))
+ }
+}
diff --git a/ai/kvcache/real-engine/verify.sh b/ai/kvcache/real-engine/verify.sh
new file mode 100755
index 0000000..0ff0f38
--- /dev/null
+++ b/ai/kvcache/real-engine/verify.sh
@@ -0,0 +1,372 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PIXIU_SOURCE="${PIXIU_SOURCE:-}"
+START_PIXIU="${START_PIXIU:-1}"
+GO_CACHE_DIR="${GO_CACHE_DIR:-/tmp/go-build-cache}"
+GO_MOD_CACHE_DIR="${GO_MOD_CACHE_DIR:-/tmp/go-mod-cache}"
+KEEP_WORK_DIR="${KEEP_WORK_DIR:-0}"
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+required_cmds=(curl jq envsubst awk sort)
+for cmd in "${required_cmds[@]}"; do
+ if ! command -v "${cmd}" >/dev/null 2>&1; then
+ echo "missing required command: ${cmd}"
+ exit 1
+ fi
+done
+
+require_env() {
+ local key="$1"
+ if [[ -z "${!key:-}" ]]; then
+ echo "missing required env: ${key}"
+ exit 1
+ fi
+}
+
+parse_url_scheme() {
+ local url="$1"
+ if [[ "${url}" == https://* ]]; then
+ echo "https"
+ else
+ echo "http"
+ fi
+}
+
+parse_url_host() {
+ local url="$1"
+ local tmp="${url#*://}"
+ tmp="${tmp%%/*}"
+ echo "${tmp%%:*}"
+}
+
+parse_url_port() {
+ local url="$1"
+ local tmp="${url#*://}"
+ tmp="${tmp%%/*}"
+ if [[ "${tmp}" == *:* ]]; then
+ echo "${tmp##*:}"
+ return
+ fi
+ if [[ "${url}" == https://* ]]; then
+ echo "443"
+ else
+ echo "80"
+ fi
+}
+
+require_env VLLM_ENDPOINT
+require_env LMCACHE_ENDPOINT
+
+export PIXIU_LISTEN_PORT="${PIXIU_LISTEN_PORT:-18889}"
+export PIXIU_PROM_PORT="${PIXIU_PROM_PORT:-2223}"
+export MODEL_NAME="${MODEL_NAME:-Qwen2.5-3B-Instruct}"
+export HOT_CONTENT_THRESHOLD="${HOT_CONTENT_THRESHOLD:-3}"
+export LOAD_THRESHOLD="${LOAD_THRESHOLD:-0.7}"
+export MEMORY_THRESHOLD="${MEMORY_THRESHOLD:-0.8}"
+export LLM_SCHEME="${LLM_SCHEME:-$(parse_url_scheme "${VLLM_ENDPOINT}")}" #
override to https if needed
+
+export ENGINE_ENDPOINT_A_ID="${ENGINE_ENDPOINT_A_ID:-engine-a}"
+export ENGINE_ENDPOINT_B_ID="${ENGINE_ENDPOINT_B_ID:-engine-b}"
+export ENGINE_ENDPOINT_A_HOST="${ENGINE_ENDPOINT_A_HOST:-$(parse_url_host
"${VLLM_ENDPOINT}")}"
+export ENGINE_ENDPOINT_A_PORT="${ENGINE_ENDPOINT_A_PORT:-$(parse_url_port
"${VLLM_ENDPOINT}")}"
+export
ENGINE_ENDPOINT_B_HOST="${ENGINE_ENDPOINT_B_HOST:-${ENGINE_ENDPOINT_A_HOST}}"
+export
ENGINE_ENDPOINT_B_PORT="${ENGINE_ENDPOINT_B_PORT:-${ENGINE_ENDPOINT_A_PORT}}"
+
+export PIN_INSTANCE_ID="${PIN_INSTANCE_ID:-${ENGINE_ENDPOINT_A_ID}}"
+export PIN_LOCATION="${PIN_LOCATION:-lmcache}"
+export COMPRESS_INSTANCE_ID="${COMPRESS_INSTANCE_ID:-${ENGINE_ENDPOINT_B_ID}}"
+export COMPRESS_LOCATION="${COMPRESS_LOCATION:-lmcache}"
+export EVICT_INSTANCE_ID="${EVICT_INSTANCE_ID:-${ENGINE_ENDPOINT_A_ID}}"
+
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:${PIXIU_LISTEN_PORT}}"
+PROM_URL="${PROM_URL:-http://127.0.0.1:${PIXIU_PROM_PORT}/metrics}"
+
+WORK_DIR="$(mktemp -d /tmp/kvcache-real-verify-XXXXXX)"
+RENDERED_CONFIG="${WORK_DIR}/pixiu.rendered.yaml"
+PIXIU_LOG="${WORK_DIR}/pixiu.log"
+
+BASELINE_ROUNDS="${BASELINE_ROUNDS:-12}"
+CACHED_ROUNDS="${CACHED_ROUNDS:-12}"
+BASELINE_FILE="${WORK_DIR}/baseline.times"
+CACHED_FILE="${WORK_DIR}/cached.times"
+
+PIXIU_PID=""
+
+cleanup() {
+ set +e
+ local exit_code=$?
+ if [[ -n "${PIXIU_PID}" ]] && kill -0 "${PIXIU_PID}" >/dev/null 2>&1; then
+ kill "${PIXIU_PID}" >/dev/null 2>&1
+ wait "${PIXIU_PID}" >/dev/null 2>&1
+ fi
+ if [[ "${KEEP_WORK_DIR}" != "1" && "${exit_code}" -eq 0 && -n
"${WORK_DIR:-}" && -d "${WORK_DIR}" ]]; then
+ rm -rf "${WORK_DIR}"
+ fi
+ return "${exit_code}"
+}
+trap cleanup EXIT INT TERM
+
+wait_for_pixiu() {
+ local body
+ body="$(jq -nc --arg model "${MODEL_NAME}" '{model:$model,prompt:"kvcache
smoke"}')"
+ for _ in $(seq 1 100); do
+ local code
+ code="$(curl -s -o /dev/null -w '%{http_code}' \
+ -H 'Content-Type: application/json' \
+ -X POST "${PIXIU_URL}/v1/chat/completions" \
+ -d "${body}" || true)"
+ if [[ "${code}" == "200" || "${code}" == "4"* || "${code}" == "5"* ]]; then
+ return 0
+ fi
+ sleep 0.3
+ done
+ return 1
+}
+
+start_pixiu() {
+ if command -v pixiu >/dev/null 2>&1; then
+ pixiu gateway start -c "${RENDERED_CONFIG}" >"${PIXIU_LOG}" 2>&1 &
+ PIXIU_PID="$!"
+ elif [[ -d "${PIXIU_SOURCE}/cmd/pixiu" ]]; then
+ (
+ cd "${PIXIU_SOURCE}"
+ env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run
./cmd/pixiu/*.go gateway start -c "${RENDERED_CONFIG}"
+ ) >"${PIXIU_LOG}" 2>&1 &
+ PIXIU_PID="$!"
+ else
+ echo "cannot find pixiu binary or source. set PIXIU_SOURCE or install
pixiu in PATH"
+ exit 1
+ fi
+
+ if ! wait_for_pixiu; then
+ echo "pixiu not ready, log: ${PIXIU_LOG}"
+ exit 1
+ fi
+}
+
+render_config() {
+ envsubst <"${SCRIPT_DIR}/pixiu/conf.yaml" >"${RENDERED_CONFIG}"
+}
+
+p95() {
+ local file="$1"
+ awk 'NF {print $1}' "${file}" | sort -n | awk '
+ {
+ arr[NR] = $1
+ }
+ END {
+ if (NR == 0) {
+ print "0"
+ exit
+ }
+ idx = int((NR * 95 + 99) / 100)
+ if (idx < 1) idx = 1
+ if (idx > NR) idx = NR
+ print arr[idx]
+ }'
+}
+
+avg() {
+ local file="$1"
+ awk '{sum += $1; n += 1} END {if (n == 0) {print "0"} else {printf "%.6f",
sum / n}}' "${file}"
+}
+
+run_load() {
+ local mode="$1"
+ local rounds="$2"
+ local output_file="$3"
+ local fixed_prompt="kvcache route preference probe"
+
+ : >"${output_file}"
+ for i in $(seq 1 "${rounds}"); do
+ local prompt
+ if [[ "${mode}" == "baseline" ]]; then
+ prompt="${fixed_prompt} baseline-${i}-$(date +%s%N)"
+ else
+ prompt="${fixed_prompt}"
+ fi
+
+ local body
+ body="$(jq -nc --arg model "${MODEL_NAME}" --arg prompt "${prompt}"
'{model:$model,messages:[{role:"user",content:$prompt}]}')"
+
+ local result
+ local response_file="${WORK_DIR}/${mode}-${i}.response.out"
+ result="$(curl -sS -o "${response_file}" -w '%{http_code} %{time_total}' \
+ -H 'Content-Type: application/json' \
+ -X POST "${PIXIU_URL}/v1/chat/completions" \
+ -d "${body}")"
+
+ local status
+ status="${result%% *}"
+ local timing
+ timing="${result##* }"
+
+ if [[ "${status}" != "200" ]]; then
+ echo "request failed in ${mode} mode: status=${status}"
+ cat "${response_file}"
+ exit 1
+ fi
+
+ echo "${timing}" >>"${output_file}"
+ done
+}
+
+lookup_probe() {
+ local tokenize_body
+ tokenize_body="$(jq -nc --arg model "${MODEL_NAME}"
'{model:$model,prompt:"kvcache lookup probe"}')"
+
+ local tokenize_resp_file="${WORK_DIR}/lookup-probe-tokenize.response.json"
+ local tokenize_status
+ tokenize_status="$(curl -sS -o "${tokenize_resp_file}" -w '%{http_code}' \
+ -H 'Content-Type: application/json' \
+ -X POST "${VLLM_ENDPOINT}/tokenize" \
+ -d "${tokenize_body}")"
+ if [[ "${tokenize_status}" != "200" ]]; then
+ echo "lookup_probe_error: tokenize returned HTTP ${tokenize_status}"
+ cat "${tokenize_resp_file}"
+ exit 1
+ fi
+
+ local tokens_json
+ tokens_json="$(jq -c 'if type == "object" then (.tokens // []) else [] end'
"${tokenize_resp_file}" 2>/dev/null || echo '[]')"
+ if [[ "${tokens_json}" == "[]" ]]; then
+ echo "lookup_probe_error: tokenize response did not contain usable tokens"
+ cat "${tokenize_resp_file}"
+ exit 1
+ fi
+
+ local lookup_body
+ lookup_body="$(jq -nc --argjson t "${tokens_json}" '{tokens:$t}')"
+ local lookup_resp_file="${WORK_DIR}/lookup-probe-lookup.response.json"
+ local lookup_status
+ lookup_status="$(curl -sS -o "${lookup_resp_file}" -w '%{http_code}' \
+ -H 'Content-Type: application/json' \
+ -X POST "${LMCACHE_ENDPOINT}/lookup" \
+ -d "${lookup_body}")"
+ if [[ "${lookup_status}" != "200" ]]; then
+ echo "lookup_probe_error: lookup returned HTTP ${lookup_status}"
+ cat "${lookup_resp_file}"
+ exit 1
+ fi
+
+ local preferred
+ preferred="$(jq -r 'if type == "object" then ((.layout_info // {}) |
to_entries | max_by(.value["1"]) | .key) else empty end // empty'
"${lookup_resp_file}" 2>/dev/null || true)"
+ if [[ -z "${preferred}" ]]; then
+ echo "lookup_probe_error: cannot parse preferred endpoint from lookup
response"
+ cat "${lookup_resp_file}"
+ exit 1
+ fi
+
+ echo "${preferred}"
+}
+
+calc_preferred_hit_rate() {
+ local preferred_id="$1"
+ local preferred_addr=""
+
+ if [[ "${preferred_id}" == "${ENGINE_ENDPOINT_A_ID}" ]]; then
+ preferred_addr="${ENGINE_ENDPOINT_A_HOST}:${ENGINE_ENDPOINT_A_PORT}"
+ elif [[ "${preferred_id}" == "${ENGINE_ENDPOINT_B_ID}" ]]; then
+ preferred_addr="${ENGINE_ENDPOINT_B_HOST}:${ENGINE_ENDPOINT_B_PORT}"
+ else
+ echo "0 0 ${preferred_id}"
+ return
+ fi
+
+ local metrics
+ metrics="$(curl -sS "${PROM_URL}" || true)"
+
+ local preferred_hits
+ preferred_hits="$(awk -v addr="${preferred_addr}" '
+ /^pixiu_llm_upstream_requests_total/ && index($0, "endpoint_address=\""
addr "\"") > 0 {sum += $NF}
+ END {printf "%.0f", sum}
+ ' <<<"${metrics}")"
+
+ local total_hits
+ total_hits="$(awk -v
addr_a="${ENGINE_ENDPOINT_A_HOST}:${ENGINE_ENDPOINT_A_PORT}" -v
addr_b="${ENGINE_ENDPOINT_B_HOST}:${ENGINE_ENDPOINT_B_PORT}" '
+ /^pixiu_llm_upstream_requests_total/ && (index($0, "endpoint_address=\""
addr_a "\"") > 0 || index($0, "endpoint_address=\"" addr_b "\"") > 0) {sum +=
$NF}
+ END {printf "%.0f", sum}
+ ' <<<"${metrics}")"
+
+ echo "${preferred_hits} ${total_hits} ${preferred_addr}"
+}
+
+echo "[1/5] rendering pixiu config"
+render_config
+echo "rendered config: ${RENDERED_CONFIG}"
+
+if [[ "${START_PIXIU}" == "1" ]]; then
+ echo "[2/5] starting pixiu"
+ start_pixiu
+else
+ echo "[2/5] using existing pixiu at ${PIXIU_URL}"
+fi
+
+echo "[3/5] probing lookup/preferred endpoint"
+preferred_id="$(lookup_probe)"
+echo "lookup preferred endpoint id: ${preferred_id}"
+
+echo "[4/5] running latency workloads (baseline vs cached)"
+run_load baseline "${BASELINE_ROUNDS}" "${BASELINE_FILE}"
+run_load cached "${CACHED_ROUNDS}" "${CACHED_FILE}"
+
+baseline_p95="$(p95 "${BASELINE_FILE}")"
+cached_p95="$(p95 "${CACHED_FILE}")"
+baseline_avg="$(avg "${BASELINE_FILE}")"
+cached_avg="$(avg "${CACHED_FILE}")"
+
+improvement_pct="$(awk -v b="${baseline_p95}" -v c="${cached_p95}" 'BEGIN { if
(b <= 0) {print "0.00"} else {printf "%.2f", ((b-c)/b)*100 } }')"
+
+echo "[5/5] computing preferred endpoint hit rate"
+read -r preferred_hits total_hits preferred_addr < <(calc_preferred_hit_rate
"${preferred_id}")
+hit_rate="$(awk -v p="${preferred_hits}" -v t="${total_hits}" 'BEGIN { if (t
<= 0) {print "0.00"} else {printf "%.2f", (p/t)*100 } }')"
+
+cat <<REPORT
+
+=== KVCache Real-Engine Verification Report ===
+Pixiu URL: ${PIXIU_URL}
+Prometheus URL: ${PROM_URL}
+Preferred endpoint id (from lookup): ${preferred_id}
+Preferred endpoint addr (mapped): ${preferred_addr}
+
+lookup success: PASS (preferred endpoint parsed)
+preferred endpoint hit: ${preferred_hits}/${total_hits} (${hit_rate}%)
+
+latency baseline avg: ${baseline_avg}s
+latency cached avg: ${cached_avg}s
+latency baseline p95: ${baseline_p95}s
+latency cached p95: ${cached_p95}s
+p95 delta (baseline-cached)/baseline: ${improvement_pct}%
+
+rendered config: ${RENDERED_CONFIG}
+REPORT
+
+if [[ "${total_hits}" == "0" ]]; then
+ echo "WARN: no llm upstream metrics observed. verify prometheus endpoint and
traffic labels."
+fi
+
+if [[ "${START_PIXIU}" == "1" ]]; then
+ echo "pixiu log: ${PIXIU_LOG}"
+fi
diff --git a/start_integrate_test.sh b/start_integrate_test.sh
index 5242f78..eecbeca 100755
--- a/start_integrate_test.sh
+++ b/start_integrate_test.sh
@@ -39,6 +39,8 @@ array=(
# plugins
"plugins/opa/embedded"
"plugins/opa/server-mode"
+ # ai
+ "ai/kvcache/mock"
)
for t in "${array[@]}"; do