(dubbo-go-pixiu-samples) branch main updated: feat: add kvcache samples (#123)

xuetaoli Thu, 12 Mar 2026 20:46:25 -0700

This is an automated email from the ASF dual-hosted git repository.

xuetaoli pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/dubbo-go-pixiu-samples.git



The following commit(s) were added to refs/heads/main by this push:
     new 0af1179  feat: add kvcache samples (#123)
0af1179 is described below

commit 0af11792d94776e63d385070f368ca2988c0bd1f
Author: 陈乐樂 <[email protected]>
AuthorDate: Fri Mar 13 11:46:09 2026 +0800

    feat: add kvcache samples (#123)
    
    * feat: add kvcache samples
    
    * fix some problems
    
    * fix:fix some bugs and add headers
    
    * feat:add integrate test
    
    * feat: add descriptions to readme
---
 README.md                                 |   6 +-
 README_CN.md                              |   5 +-
 ai/kvcache/README.md                      |  12 +
 ai/kvcache/README_zh.md                   |  12 +
 ai/kvcache/mock/README.md                 |  40 +++
 ai/kvcache/mock/README_zh.md              |  40 +++
 ai/kvcache/mock/pixiu/conf.yaml           |  97 +++++++
 ai/kvcache/mock/request.sh                |  58 ++++
 ai/kvcache/mock/run.sh                    | 216 ++++++++++++++
 ai/kvcache/mock/server/app/main.go        | 465 ++++++++++++++++++++++++++++++
 ai/kvcache/mock/server/controller/main.go | 197 +++++++++++++
 ai/kvcache/mock/server/engine-a/main.go   | 231 +++++++++++++++
 ai/kvcache/mock/server/engine-b/main.go   | 152 ++++++++++
 ai/kvcache/mock/test/pixiu_test.go        | 185 ++++++++++++
 ai/kvcache/real-engine/README.md          |  45 +++
 ai/kvcache/real-engine/README_zh.md       |  45 +++
 ai/kvcache/real-engine/pixiu/conf.yaml    | 101 +++++++
 ai/kvcache/real-engine/request.sh         |  37 +++
 ai/kvcache/real-engine/test/pixiu_test.go | 110 +++++++
 ai/kvcache/real-engine/verify.sh          | 372 ++++++++++++++++++++++++
 start_integrate_test.sh                   |   2 +
 21 files changed, 2426 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 57afe11..514f239 100644
--- a/README.md
+++ b/README.md
@@ -79,6 +79,10 @@ This project includes multiple samples covering conversions 
such as HTTP to Dubb
 
   * `authserver`: OAuth2 authorization server implementation providing full 
authorization code flow with PKCE, JWT token generation, and validation
 
+* **ai**: AI gateway sample collection
+
+  * `kvcache`: Two-layer KVCache samples. `mock` validates Pixiu + kvcache 
filter integration locally (tokenize/lookup/pin/route hint); `real-engine` is 
for BYOE integration with real vLLM + LMCache controller.
+
 * **xds**: Pixiu integration with xDS
 
 ## Other Projects in the Dubbo-Go-Pixiu Ecosystem
@@ -102,4 +106,4 @@ If you’d like to add new examples, please follow these 
steps:
 
 ## License
 
-This project is licensed under the [Apache License 2.0](LICENSE).
\ No newline at end of file
+This project is licensed under the [Apache License 2.0](LICENSE).
diff --git a/README_CN.md b/README_CN.md
index 865f9ba..f78ed89 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -73,6 +73,9 @@
 
 - xds：pixiu 集成 xds
 
+- ai：AI 网关示例集合
+  - ai/kvcache：提供两层 KVCache 示例。`mock` 用于本地快速验证 Pixiu + kvcache filter 
链路（tokenize/lookup/pin/route hint）；`real-engine` 用于 BYOE 场景，对接真实 vLLM + LMCache 
controller。
+
 ## Dubbo-go-pixiu 生态系统的其他项目
 
 -   
**[pixiu-admin](https://github.com/apache/dubbo-go-pixiu/tree/develop/admin)** 
Dubbo-go-pixiu Admin 是 dubbo-go-pixiu 网关的综合管理平台。它提供了一个集中的控制面板，用于通过基于 Web 的用户界面和 
RESTful API 来配置、监控和管理网关资源。
@@ -89,4 +92,4 @@
 
 ## 许可证
 
-本项目采用 [Apache License 2.0](LICENSE) 开源许可。
\ No newline at end of file
+本项目采用 [Apache License 2.0](LICENSE) 开源许可。
diff --git a/ai/kvcache/README.md b/ai/kvcache/README.md
new file mode 100644
index 0000000..02b88f0
--- /dev/null
+++ b/ai/kvcache/README.md
@@ -0,0 +1,12 @@
+# Pixiu KVCache Samples Index
+
+[中文](./README_zh.md) | English
+
+This directory provides two KVCache samples and follows the HOWTO-style sample 
layout (`pixiu/`, `test/`, request scripts).
+
+## Samples
+
+| Sample | English README | Chinese README | Purpose |
+| --- | --- | --- | --- |
+| `mock/` | [`ai/kvcache/mock/README.md`](./mock/README.md) | 
[`ai/kvcache/mock/README_zh.md`](./mock/README_zh.md) | Local validation for 
tokenize/lookup/pin/route-hint integration. |
+| `real-engine/` | 
[`ai/kvcache/real-engine/README.md`](./real-engine/README.md) | 
[`ai/kvcache/real-engine/README_zh.md`](./real-engine/README_zh.md) | BYOE 
integration with real vLLM + LMCache controller. |
diff --git a/ai/kvcache/README_zh.md b/ai/kvcache/README_zh.md
new file mode 100644
index 0000000..24db4bc
--- /dev/null
+++ b/ai/kvcache/README_zh.md
@@ -0,0 +1,12 @@
+# Pixiu KVCache 示例索引
+
+[English](./README.md) | 中文
+
+本目录提供两层 KVCache 示例，并遵循 HOWTO 约定的 sample 结构（`pixiu/`、`test/`、请求脚本）。
+
+## 示例列表
+
+| 示例 | English README | 中文 README | 目标 |
+| --- | --- | --- | --- |
+| `mock/` | [`ai/kvcache/mock/README.md`](./mock/README.md) | 
[`ai/kvcache/mock/README_zh.md`](./mock/README_zh.md) | 本地验证 
tokenize/lookup/pin/route hint 链路。 |
+| `real-engine/` | 
[`ai/kvcache/real-engine/README.md`](./real-engine/README.md) | 
[`ai/kvcache/real-engine/README_zh.md`](./real-engine/README_zh.md) | BYOE 对接真实 
vLLM + LMCache controller。 |
diff --git a/ai/kvcache/mock/README.md b/ai/kvcache/mock/README.md
new file mode 100644
index 0000000..2c99fb4
--- /dev/null
+++ b/ai/kvcache/mock/README.md
@@ -0,0 +1,40 @@
+# KVCache Mock Sample
+
+[Back to KVCache Index](../README.md) | English | [中文](./README_zh.md)
+
+## Layout
+
+- `pixiu/conf.yaml`: Pixiu configuration
+- `server/controller/main.go`: mock LMCache controller (`/lookup` `/pin` 
`/compress` `/evict`)
+- `server/engine-a/main.go`: mock engine A (`/tokenize` + 
`/v1/chat/completions`)
+- `server/engine-b/main.go`: mock engine B (`/v1/chat/completions`)
+- `request.sh`: request script
+- `test/pixiu_test.go`: integration test
+- `run.sh`: one-command startup + validation
+
+## Quick Start (CLI)
+
+1. Start sample services and Pixiu:
+
+```bash
+cd ai/kvcache/mock
+./run.sh
+```
+
+2. Send manual requests:
+
+```bash
+./request.sh
+```
+
+3. Run test case:
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## Verification Targets
+
+- tokenize call works (`engine-a`)
+- lookup/pin calls work (`controller`)
+- second same prompt is routed to preferred endpoint (`mock-llm-b`)
diff --git a/ai/kvcache/mock/README_zh.md b/ai/kvcache/mock/README_zh.md
new file mode 100644
index 0000000..4a8a9ee
--- /dev/null
+++ b/ai/kvcache/mock/README_zh.md
@@ -0,0 +1,40 @@
+# KVCache Mock 示例
+
+[返回 KVCache 索引](../README_zh.md) | [English](./README.md) | 中文
+
+## 目录结构
+
+- `pixiu/conf.yaml`: Pixiu 配置
+- `server/controller/main.go`: mock LMCache controller（`/lookup` `/pin` 
`/compress` `/evict`）
+- `server/engine-a/main.go`: mock 引擎 A（`/tokenize` + `/v1/chat/completions`）
+- `server/engine-b/main.go`: mock 引擎 B（`/v1/chat/completions`）
+- `request.sh`: 请求脚本
+- `test/pixiu_test.go`: 集成测试
+- `run.sh`: 一键启动并验收
+
+## 命令行快速开始
+
+1. 启动 sample 服务和 Pixiu：
+
+```bash
+cd ai/kvcache/mock
+./run.sh
+```
+
+2. 手工发请求：
+
+```bash
+./request.sh
+```
+
+3. 运行测试用例：
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## 验证目标
+
+- tokenize 调用生效（`engine-a`）
+- lookup/pin 调用生效（`controller`）
+- 同 prompt 第二次请求路由到 preferred endpoint（`mock-llm-b`）
diff --git a/ai/kvcache/mock/pixiu/conf.yaml b/ai/kvcache/mock/pixiu/conf.yaml
new file mode 100644
index 0000000..7c50c85
--- /dev/null
+++ b/ai/kvcache/mock/pixiu/conf.yaml
@@ -0,0 +1,97 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+---
+static_resources:
+  listeners:
+    - name: "kvcache_mock"
+      protocol_type: "HTTP"
+      address:
+        socket_address:
+          address: "0.0.0.0"
+          port: 18888
+      filter_chains:
+        filters:
+          - name: dgp.filter.httpconnectionmanager
+            config:
+              route_config:
+                routes:
+                  - match:
+                      prefix: "/v1/chat/completions"
+                    route:
+                      cluster: "mock_llm"
+                      cluster_not_found_response_code: 505
+              http_filters:
+                - name: dgp.filter.ai.kvcache
+                  config:
+                    enabled: true
+                    vllm_endpoint: "http://127.0.0.1:18091";
+                    lmcache_endpoint: "http://127.0.0.1:18081";
+                    default_model: "mock-model"
+                    request_timeout: 3s
+                    lookup_routing_timeout: 80ms
+                    hot_window: 2m
+                    hot_max_records: 100
+                    token_cache:
+                      enabled: true
+                      max_size: 1024
+                      ttl: 10m
+                    cache_strategy:
+                      enable_compression: true
+                      enable_pinning: true
+                      enable_eviction: true
+                      memory_threshold: 0.000001
+                      hot_content_threshold: 1
+                      load_threshold: 0.000001
+                      pin_instance_id: "mock-llm-b"
+                      pin_location: "ram-b"
+                      compress_instance_id: "mock-llm-b"
+                      compress_location: "ram-b"
+                      compress_method: "zstd"
+                      evict_instance_id: "mock-llm-a"
+                - name: dgp.filter.llm.proxy
+                  config:
+                    timeout: 30s
+                    maxIdleConns: 100
+                    maxIdleConnsPerHost: 100
+                    maxConnsPerHost: 100
+                    scheme: "http"
+      config:
+        idle_timeout: 30s
+        read_timeout: 30s
+        write_timeout: 30s
+
+  clusters:
+    - name: "mock_llm"
+      lb_policy: "round_robin"
+      endpoints:
+        - id: "mock-llm-a"
+          socket_address:
+            address: "127.0.0.1"
+            port: 18091
+        - id: "mock-llm-b"
+          socket_address:
+            address: "127.0.0.1"
+            port: 18092
+
+  shutdown_config:
+    timeout: "10s"
+    step_timeout: "2s"
+    reject_policy: "immediacy"
+
+metric:
+  enable: true
+  prometheus_port: 2222
diff --git a/ai/kvcache/mock/request.sh b/ai/kvcache/mock/request.sh
new file mode 100755
index 0000000..56a45ee
--- /dev/null
+++ b/ai/kvcache/mock/request.sh
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:18888}";
+CONTROLLER_URL="${CONTROLLER_URL:-http://127.0.0.1:18081}";
+ENGINE_A_URL="${ENGINE_A_URL:-http://127.0.0.1:18091}";
+ENGINE_B_URL="${ENGINE_B_URL:-http://127.0.0.1:18092}";
+
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+body='{"model":"mock-model","messages":[{"role":"user","content":"route this 
same prompt"}]}'
+
+echo "reset stats"
+curl -fsS -X POST "${CONTROLLER_URL}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_A_URL}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_B_URL}/reset" >/dev/null
+
+echo "request #1"
+curl -fsS -H 'Content-Type: application/json' -X POST 
"${PIXIU_URL}/v1/chat/completions" -d "${body}"
+echo ""
+
+sleep 0.4
+echo "request #2"
+curl -fsS -H 'Content-Type: application/json' -X POST 
"${PIXIU_URL}/v1/chat/completions" -d "${body}"
+echo ""
+
+sleep 0.8
+echo "controller stats"
+curl -fsS "${CONTROLLER_URL}/stats"
+echo ""
+
+echo "engine-a stats"
+curl -fsS "${ENGINE_A_URL}/stats"
+echo ""
+
+echo "engine-b stats"
+curl -fsS "${ENGINE_B_URL}/stats"
+echo ""
diff --git a/ai/kvcache/mock/run.sh b/ai/kvcache/mock/run.sh
new file mode 100755
index 0000000..745b278
--- /dev/null
+++ b/ai/kvcache/mock/run.sh
@@ -0,0 +1,216 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PIXIU_CONFIG="${PIXIU_CONFIG:-${SCRIPT_DIR}/pixiu/conf.yaml}"
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:18888}";
+LMCACHE_ADMIN="${LMCACHE_ADMIN:-http://127.0.0.1:18081}";
+ENGINE_A_ADMIN="${ENGINE_A_ADMIN:-http://127.0.0.1:18091}";
+ENGINE_B_ADMIN="${ENGINE_B_ADMIN:-http://127.0.0.1:18092}";
+PIXIU_SOURCE="${PIXIU_SOURCE:-$(cd "${SCRIPT_DIR}/../../../.." && 
pwd)/dubbo-go-pixiu}"
+GO_CACHE_DIR="${GO_CACHE_DIR:-/tmp/go-build-cache}"
+GO_MOD_CACHE_DIR="${GO_MOD_CACHE_DIR:-/tmp/go-mod-cache}"
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+WORK_DIR="$(mktemp -d /tmp/kvcache-mock-XXXXXX)"
+CONTROLLER_LOG="${WORK_DIR}/mock_controller.log"
+ENGINE_A_LOG="${WORK_DIR}/mock_engine_a.log"
+ENGINE_B_LOG="${WORK_DIR}/mock_engine_b.log"
+PIXIU_LOG="${WORK_DIR}/pixiu.log"
+
+CONTROLLER_PID=""
+ENGINE_A_PID=""
+ENGINE_B_PID=""
+PIXIU_PID=""
+
+REQ_BODY='{"model":"mock-model","messages":[{"role":"user","content":"please 
explain kv cache routing"}]}'
+
+cleanup() {
+  set +e
+  if [[ -n "${PIXIU_PID}" ]] && kill -0 "${PIXIU_PID}" >/dev/null 2>&1; then
+    kill "${PIXIU_PID}" >/dev/null 2>&1
+    wait "${PIXIU_PID}" >/dev/null 2>&1
+  fi
+  if [[ -n "${CONTROLLER_PID}" ]] && kill -0 "${CONTROLLER_PID}" >/dev/null 
2>&1; then
+    kill "${CONTROLLER_PID}" >/dev/null 2>&1
+    wait "${CONTROLLER_PID}" >/dev/null 2>&1
+  fi
+  if [[ -n "${ENGINE_A_PID}" ]] && kill -0 "${ENGINE_A_PID}" >/dev/null 2>&1; 
then
+    kill "${ENGINE_A_PID}" >/dev/null 2>&1
+    wait "${ENGINE_A_PID}" >/dev/null 2>&1
+  fi
+  if [[ -n "${ENGINE_B_PID}" ]] && kill -0 "${ENGINE_B_PID}" >/dev/null 2>&1; 
then
+    kill "${ENGINE_B_PID}" >/dev/null 2>&1
+    wait "${ENGINE_B_PID}" >/dev/null 2>&1
+  fi
+}
+trap cleanup EXIT INT TERM
+
+extract_num() {
+  local json="$1"
+  local key="$2"
+  sed -n "s/.*\"${key}\":\([0-9][0-9]*\).*/\1/p" <<<"${json}"
+}
+
+extract_str() {
+  local json="$1"
+  local key="$2"
+  sed -n "s/.*\"${key}\":\"\([^\"]*\)\".*/\1/p" <<<"${json}"
+}
+
+wait_for_health() {
+  local url="$1"
+  for _ in $(seq 1 80); do
+    if curl -fsS "${url}" >/dev/null 2>&1; then
+      return 0
+    fi
+    sleep 0.2
+  done
+  return 1
+}
+
+wait_for_pixiu() {
+  for _ in $(seq 1 80); do
+    local status
+    status="$(curl -s -o "${WORK_DIR}/kvcache-pixiu-ready.out" -w 
'%{http_code}' \
+      -H 'Content-Type: application/json' \
+      -X POST "${PIXIU_URL}/v1/chat/completions" \
+      -d "${REQ_BODY}" || true)"
+    if [[ "${status}" == "200" || "${status}" == "4"* || "${status}" == "5"* 
]]; then
+      return 0
+    fi
+    sleep 0.2
+  done
+  return 1
+}
+
+start_mocks() {
+  (
+    cd "${SCRIPT_DIR}"
+    env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run 
./server/controller >"${CONTROLLER_LOG}" 2>&1
+  ) &
+  CONTROLLER_PID="$!"
+
+  (
+    cd "${SCRIPT_DIR}"
+    env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run 
./server/engine-a >"${ENGINE_A_LOG}" 2>&1
+  ) &
+  ENGINE_A_PID="$!"
+
+  (
+    cd "${SCRIPT_DIR}"
+    env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run 
./server/engine-b >"${ENGINE_B_LOG}" 2>&1
+  ) &
+  ENGINE_B_PID="$!"
+
+  wait_for_health "${LMCACHE_ADMIN}/health"
+  wait_for_health "${ENGINE_A_ADMIN}/health"
+  wait_for_health "${ENGINE_B_ADMIN}/health"
+}
+
+start_pixiu() {
+  if command -v pixiu >/dev/null 2>&1; then
+    pixiu gateway start -c "${PIXIU_CONFIG}" >"${PIXIU_LOG}" 2>&1 &
+    PIXIU_PID="$!"
+  elif [[ -d "${PIXIU_SOURCE}/cmd/pixiu" ]]; then
+    (
+      cd "${PIXIU_SOURCE}"
+      env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run 
./cmd/pixiu/*.go gateway start -c "${PIXIU_CONFIG}"
+    ) >"${PIXIU_LOG}" 2>&1 &
+    PIXIU_PID="$!"
+  else
+    echo "cannot find pixiu binary or source. set PIXIU_SOURCE or install 
pixiu in PATH"
+    return 1
+  fi
+
+  if ! wait_for_pixiu; then
+    echo "pixiu start failed, log: ${PIXIU_LOG}"
+    return 1
+  fi
+}
+
+echo "[1/4] starting mock controller + engines"
+start_mocks
+
+echo "[2/4] starting pixiu"
+start_pixiu
+
+curl -fsS -X POST "${LMCACHE_ADMIN}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_A_ADMIN}/reset" >/dev/null
+curl -fsS -X POST "${ENGINE_B_ADMIN}/reset" >/dev/null
+
+echo "[3/4] sending warmup and routed requests"
+resp1="$(curl -fsS -H 'Content-Type: application/json' -X POST 
"${PIXIU_URL}/v1/chat/completions" -d "${REQ_BODY}")"
+sleep 0.6
+resp2="$(curl -fsS -H 'Content-Type: application/json' -X POST 
"${PIXIU_URL}/v1/chat/completions" -d "${REQ_BODY}")"
+
+sleep 1.0
+controller_stats="$(curl -fsS "${LMCACHE_ADMIN}/stats")"
+engine_a_stats="$(curl -fsS "${ENGINE_A_ADMIN}/stats")"
+engine_b_stats="$(curl -fsS "${ENGINE_B_ADMIN}/stats")"
+
+echo "[4/4] evaluating results"
+second_served_by="$(extract_str "${resp2}" "served_by")"
+tokenize_calls="$(extract_num "${engine_a_stats}" "tokenize_calls")"
+lookup_calls="$(extract_num "${controller_stats}" "lookup_calls")"
+pin_calls="$(extract_num "${controller_stats}" "pin_calls")"
+llm_b_calls="$(extract_num "${engine_b_stats}" "chat_calls")"
+
+fail=0
+if [[ -z "${tokenize_calls}" || "${tokenize_calls}" -lt 1 ]]; then
+  echo "FAIL: tokenize was not called on engine-a"
+  fail=1
+fi
+if [[ -z "${lookup_calls}" || "${lookup_calls}" -lt 2 ]]; then
+  echo "FAIL: lookup should be called at least twice (cache build + route 
hint)"
+  fail=1
+fi
+if [[ -z "${pin_calls}" || "${pin_calls}" -lt 1 ]]; then
+  echo "FAIL: pin was not called"
+  fail=1
+fi
+if [[ "${second_served_by}" != "mock-llm-b" ]]; then
+  echo "FAIL: expected routed request served_by=mock-llm-b, got 
'${second_served_by}'"
+  fail=1
+fi
+if [[ -z "${llm_b_calls}" || "${llm_b_calls}" -lt 1 ]]; then
+  echo "FAIL: preferred endpoint mock-llm-b received no traffic"
+  fail=1
+fi
+
+echo ""
+echo "response#1: ${resp1}"
+echo "response#2: ${resp2}"
+echo "controller_stats: ${controller_stats}"
+echo "engine_a_stats: ${engine_a_stats}"
+echo "engine_b_stats: ${engine_b_stats}"
+echo "controller log: ${CONTROLLER_LOG}"
+echo "engine-a log: ${ENGINE_A_LOG}"
+echo "engine-b log: ${ENGINE_B_LOG}"
+echo "pixiu log: ${PIXIU_LOG}"
+
+if [[ "${fail}" -ne 0 ]]; then
+  exit 1
+fi
+
+echo "PASS: kvcache mock sample validated with 3 isolated mocks (controller + 
two engines)."
diff --git a/ai/kvcache/mock/server/app/main.go 
b/ai/kvcache/mock/server/app/main.go
new file mode 100644
index 0000000..957201f
--- /dev/null
+++ b/ai/kvcache/mock/server/app/main.go
@@ -0,0 +1,465 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package main
+
+import (
+       "encoding/json"
+       "fmt"
+       "log"
+       "net/http"
+       "os"
+       "strconv"
+       "strings"
+       "sync"
+       "sync/atomic"
+       "time"
+)
+
+type controllerStats struct {
+       mu sync.Mutex
+
+       lookupCalls   int
+       lookupSuccess int
+       lookupFailure int
+       pinCalls      int
+       compressCalls int
+       evictCalls    int
+}
+
+type engineAStats struct {
+       mu sync.Mutex
+
+       tokenizeCalls int
+       chatCalls     int
+}
+
+type engineBStats struct {
+       mu sync.Mutex
+
+       chatCalls int
+}
+
+type lookupRequest struct {
+       Tokens []int `json:"tokens"`
+}
+
+type tokensRequest struct {
+       Tokens []int `json:"tokens"`
+}
+
+type eventResp struct {
+       EventID   string `json:"event_id"`
+       NumTokens int    `json:"num_tokens"`
+}
+
+type llmRequest struct {
+       Model string `json:"model"`
+}
+
+type llmMessage struct {
+       Role    string `json:"role"`
+       Content string `json:"content"`
+}
+
+type llmChoice struct {
+       Index   int        `json:"index"`
+       Message llmMessage `json:"message"`
+}
+
+type llmResponse struct {
+       ID       string         `json:"id"`
+       Object   string         `json:"object"`
+       Model    string         `json:"model"`
+       ServedBy string         `json:"served_by"`
+       Choices  []llmChoice    `json:"choices"`
+       Usage    map[string]int `json:"usage"`
+}
+
+type tokenizeResponse struct {
+       Count  int   `json:"count"`
+       Tokens []int `json:"tokens"`
+       MaxLen int   `json:"max_model_len"`
+}
+
+var globalEventCounter uint64
+
+func main() {
+       controllerAddr := envOrDefault("LMCACHE_ADDR", ":18081")
+       engineAAddr := envOrDefault("LLM_A_ADDR", ":18091")
+       engineBAddr := envOrDefault("LLM_B_ADDR", ":18092")
+       preferredEndpoint := envOrDefault("PREFERRED_ENDPOINT_ID", "mock-llm-b")
+       engineAID := envOrDefault("LLM_A_ID", "mock-llm-a")
+       engineBID := envOrDefault("LLM_B_ID", "mock-llm-b")
+
+       controller := buildControllerMux(preferredEndpoint)
+       engineA := buildEngineAMux(engineAID)
+       engineB := buildEngineBMux(engineBID)
+
+       errCh := make(chan error, 3)
+       go serve("mock-controller", controllerAddr, controller, errCh)
+       go serve("mock-engine-a", engineAAddr, engineA, errCh)
+       go serve("mock-engine-b", engineBAddr, engineB, errCh)
+
+       err := <-errCh
+       log.Fatalf("kvcache mock app exited: %v", err)
+}
+
+func serve(name string, addr string, handler http.Handler, errCh chan<- error) 
{
+       srv := &http.Server{
+               Addr:              addr,
+               Handler:           handler,
+               ReadHeaderTimeout: 3 * time.Second,
+       }
+       log.Printf("[%s] listening on %s", name, addr)
+       if err := srv.ListenAndServe(); err != nil && err != 
http.ErrServerClosed {
+               errCh <- fmt.Errorf("%s failed: %w", name, err)
+       }
+}
+
+func buildControllerMux(preferred string) http.Handler {
+       stats := &controllerStats{}
+       mux := http.NewServeMux()
+
+       mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true, 
"component": "mock-controller"})
+       })
+       mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+               stats.mu.Lock()
+               defer stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "lookup_calls":         stats.lookupCalls,
+                       "lookup_success":       stats.lookupSuccess,
+                       "lookup_failure":       stats.lookupFailure,
+                       "pin_calls":            stats.pinCalls,
+                       "compress_calls":       stats.compressCalls,
+                       "evict_calls":          stats.evictCalls,
+                       "preferred_endpoint":   preferred,
+                       "timestamp_unix_milli": time.Now().UnixMilli(),
+               })
+       })
+       mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               stats.mu.Lock()
+               stats.lookupCalls = 0
+               stats.lookupSuccess = 0
+               stats.lookupFailure = 0
+               stats.pinCalls = 0
+               stats.compressCalls = 0
+               stats.evictCalls = 0
+               stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+       })
+       mux.HandleFunc("/lookup", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               var req lookupRequest
+               if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+                       stats.mu.Lock()
+                       stats.lookupFailure++
+                       stats.mu.Unlock()
+                       writeJSON(w, http.StatusBadRequest, 
map[string]string{"error": "invalid request"})
+                       return
+               }
+               tokenCount := len(req.Tokens)
+               if tokenCount == 0 {
+                       tokenCount = 1
+               }
+               layout := map[string]map[string]any{
+                       "mock-llm-a": {"0": "ram-a", "1": max(tokenCount/2, 1)},
+                       "mock-llm-b": {"0": "ram-b", "1": tokenCount + 3},
+               }
+               if preferred == "mock-llm-a" {
+                       layout["mock-llm-a"]["1"] = tokenCount + 3
+                       layout["mock-llm-b"]["1"] = max(tokenCount/2, 1)
+               }
+
+               stats.mu.Lock()
+               stats.lookupCalls++
+               stats.lookupSuccess++
+               stats.mu.Unlock()
+
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "event_id":    nextEventID("lookup"),
+                       "layout_info": layout,
+               })
+       })
+       mux.HandleFunc("/pin", func(w http.ResponseWriter, r *http.Request) {
+               handleTokenEvent(stats, w, r, "pin")
+       })
+       mux.HandleFunc("/compress", func(w http.ResponseWriter, r 
*http.Request) {
+               handleTokenEvent(stats, w, r, "compress")
+       })
+       mux.HandleFunc("/evict", func(w http.ResponseWriter, r *http.Request) {
+               handleTokenEvent(stats, w, r, "evict")
+       })
+
+       return mux
+}
+
+func handleTokenEvent(stats *controllerStats, w http.ResponseWriter, r 
*http.Request, op string) {
+       if r.Method != http.MethodPost {
+               writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+               return
+       }
+       var req tokensRequest
+       if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+               writeJSON(w, http.StatusBadRequest, map[string]string{"error": 
"invalid request"})
+               return
+       }
+       stats.mu.Lock()
+       switch op {
+       case "pin":
+               stats.pinCalls++
+       case "compress":
+               stats.compressCalls++
+       case "evict":
+               stats.evictCalls++
+       }
+       stats.mu.Unlock()
+       writeJSON(w, http.StatusOK, eventResp{EventID: nextEventID(op), 
NumTokens: len(req.Tokens)})
+}
+
+func buildEngineAMux(engineID string) http.Handler {
+       stats := &engineAStats{}
+       mux := http.NewServeMux()
+
+       mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true, 
"engine": engineID, "tokenize_enabled": true})
+       })
+       mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+               stats.mu.Lock()
+               defer stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "tokenize_calls":       stats.tokenizeCalls,
+                       "chat_calls":           stats.chatCalls,
+                       "engine_id":            engineID,
+                       "timestamp_unix_milli": time.Now().UnixMilli(),
+               })
+       })
+       mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               stats.mu.Lock()
+               stats.tokenizeCalls = 0
+               stats.chatCalls = 0
+               stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+       })
+       mux.HandleFunc("/tokenize", func(w http.ResponseWriter, r 
*http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               prompt := extractPrompt(r)
+               tokens := tokenizePrompt(prompt)
+
+               stats.mu.Lock()
+               stats.tokenizeCalls++
+               stats.mu.Unlock()
+
+               writeJSON(w, http.StatusOK, tokenizeResponse{Count: 
len(tokens), Tokens: tokens, MaxLen: 8192})
+       })
+       mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r 
*http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               var req llmRequest
+               _ = json.NewDecoder(r.Body).Decode(&req)
+               if req.Model == "" {
+                       req.Model = "mock-model"
+               }
+               stats.mu.Lock()
+               stats.chatCalls++
+               stats.mu.Unlock()
+
+               resp := llmResponse{
+                       ID:       nextEventID("chatcmpl"),
+                       Object:   "chat.completion",
+                       Model:    req.Model,
+                       ServedBy: engineID,
+                       Choices: []llmChoice{{
+                               Index: 0,
+                               Message: llmMessage{
+                                       Role:    "assistant",
+                                       Content: fmt.Sprintf("mock response 
from %s", engineID),
+                               },
+                       }},
+                       Usage: map[string]int{"prompt_tokens": 8, 
"completion_tokens": 8, "total_tokens": 16},
+               }
+               writeJSON(w, http.StatusOK, resp)
+       })
+
+       return mux
+}
+
+func buildEngineBMux(engineID string) http.Handler {
+       stats := &engineBStats{}
+       mux := http.NewServeMux()
+
+       mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true, 
"engine": engineID, "tokenize_enabled": false})
+       })
+       mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+               stats.mu.Lock()
+               defer stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "chat_calls":           stats.chatCalls,
+                       "engine_id":            engineID,
+                       "timestamp_unix_milli": time.Now().UnixMilli(),
+               })
+       })
+       mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               stats.mu.Lock()
+               stats.chatCalls = 0
+               stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+       })
+       mux.HandleFunc("/tokenize", func(w http.ResponseWriter, _ 
*http.Request) {
+               writeJSON(w, http.StatusNotFound, map[string]string{"error": 
"tokenize not available on this instance"})
+       })
+       mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r 
*http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               var req llmRequest
+               _ = json.NewDecoder(r.Body).Decode(&req)
+               if req.Model == "" {
+                       req.Model = "mock-model"
+               }
+
+               stats.mu.Lock()
+               stats.chatCalls++
+               stats.mu.Unlock()
+
+               resp := llmResponse{
+                       ID:       nextEventID("chatcmpl"),
+                       Object:   "chat.completion",
+                       Model:    req.Model,
+                       ServedBy: engineID,
+                       Choices: []llmChoice{{
+                               Index: 0,
+                               Message: llmMessage{
+                                       Role:    "assistant",
+                                       Content: fmt.Sprintf("mock response 
from %s", engineID),
+                               },
+                       }},
+                       Usage: map[string]int{"prompt_tokens": 8, 
"completion_tokens": 8, "total_tokens": 16},
+               }
+               writeJSON(w, http.StatusOK, resp)
+       })
+
+       return mux
+}
+
+func extractPrompt(r *http.Request) string {
+       if r == nil || r.Body == nil {
+               return ""
+       }
+       var payload map[string]any
+       if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
+               return ""
+       }
+       if prompt, ok := payload["prompt"]; ok {
+               switch v := prompt.(type) {
+               case string:
+                       return strings.TrimSpace(v)
+               case []any:
+                       parts := make([]string, 0, len(v))
+                       for _, item := range v {
+                               if str, ok := item.(string); ok {
+                                       parts = append(parts, str)
+                               }
+                       }
+                       return strings.Join(parts, "\n")
+               }
+       }
+       messages, ok := payload["messages"].([]any)
+       if !ok {
+               return ""
+       }
+       parts := make([]string, 0, len(messages))
+       for _, item := range messages {
+               msgMap, ok := item.(map[string]any)
+               if !ok {
+                       continue
+               }
+               if content, ok := msgMap["content"].(string); ok {
+                       parts = append(parts, content)
+               }
+       }
+       return strings.Join(parts, "\n")
+}
+
+func tokenizePrompt(prompt string) []int {
+       prompt = strings.TrimSpace(prompt)
+       if prompt == "" {
+               return []int{0}
+       }
+       words := strings.Fields(prompt)
+       if len(words) == 0 {
+               return []int{0}
+       }
+       tokens := make([]int, 0, len(words))
+       for idx, word := range words {
+               sum := 0
+               for _, r := range word {
+                       sum += int(r)
+               }
+               tokens = append(tokens, (sum%997)+idx+1)
+       }
+       return tokens
+}
+
+func nextEventID(prefix string) string {
+       n := atomic.AddUint64(&globalEventCounter, 1)
+       return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+       w.Header().Set("Content-Type", "application/json")
+       w.WriteHeader(status)
+       _ = json.NewEncoder(w).Encode(payload)
+}
+
+func envOrDefault(key string, fallback string) string {
+       val, ok := os.LookupEnv(key)
+       if !ok || strings.TrimSpace(val) == "" {
+               return fallback
+       }
+       return val
+}
+
+func max(a int, b int) int {
+       if a > b {
+               return a
+       }
+       return b
+}
diff --git a/ai/kvcache/mock/server/controller/main.go 
b/ai/kvcache/mock/server/controller/main.go
new file mode 100644
index 0000000..6b90f1c
--- /dev/null
+++ b/ai/kvcache/mock/server/controller/main.go
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package main
+
+import (
+       "encoding/json"
+       "log"
+       "net/http"
+       "os"
+       "strconv"
+       "strings"
+       "sync"
+       "sync/atomic"
+       "time"
+)
+
+type controllerStats struct {
+       mu sync.Mutex
+
+       lookupCalls   int
+       lookupSuccess int
+       lookupFailure int
+       pinCalls      int
+       compressCalls int
+       evictCalls    int
+}
+
+type lookupRequest struct {
+       Tokens []int `json:"tokens"`
+}
+
+type tokensRequest struct {
+       Tokens []int `json:"tokens"`
+}
+
+type eventResp struct {
+       EventID   string `json:"event_id"`
+       NumTokens int    `json:"num_tokens"`
+}
+
+var eventCounter uint64
+
+func main() {
+       addr := envOrDefault("LMCACHE_ADDR", ":18081")
+       preferred := envOrDefault("PREFERRED_ENDPOINT_ID", "mock-llm-b")
+       stats := &controllerStats{}
+
+       mux := http.NewServeMux()
+       mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true, 
"component": "mock-controller"})
+       })
+       mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+               stats.mu.Lock()
+               defer stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "lookup_calls":         stats.lookupCalls,
+                       "lookup_success":       stats.lookupSuccess,
+                       "lookup_failure":       stats.lookupFailure,
+                       "pin_calls":            stats.pinCalls,
+                       "compress_calls":       stats.compressCalls,
+                       "evict_calls":          stats.evictCalls,
+                       "preferred_endpoint":   preferred,
+                       "timestamp_unix_milli": time.Now().UnixMilli(),
+               })
+       })
+       mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               stats.mu.Lock()
+               stats.lookupCalls = 0
+               stats.lookupSuccess = 0
+               stats.lookupFailure = 0
+               stats.pinCalls = 0
+               stats.compressCalls = 0
+               stats.evictCalls = 0
+               stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+       })
+       mux.HandleFunc("/lookup", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               var req lookupRequest
+               if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+                       stats.mu.Lock()
+                       stats.lookupFailure++
+                       stats.mu.Unlock()
+                       writeJSON(w, http.StatusBadRequest, 
map[string]string{"error": "invalid request"})
+                       return
+               }
+               tokenCount := len(req.Tokens)
+               if tokenCount == 0 {
+                       tokenCount = 1
+               }
+               layout := map[string]map[string]any{
+                       "mock-llm-a": {"0": "ram-a", "1": max(tokenCount/2, 1)},
+                       "mock-llm-b": {"0": "ram-b", "1": tokenCount + 3},
+               }
+               if preferred == "mock-llm-a" {
+                       layout["mock-llm-a"]["1"] = tokenCount + 3
+                       layout["mock-llm-b"]["1"] = max(tokenCount/2, 1)
+               }
+
+               stats.mu.Lock()
+               stats.lookupCalls++
+               stats.lookupSuccess++
+               stats.mu.Unlock()
+
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "event_id":    nextEventID("lookup"),
+                       "layout_info": layout,
+               })
+       })
+       mux.HandleFunc("/pin", func(w http.ResponseWriter, r *http.Request) {
+               handleTokenEvent(stats, w, r, "pin")
+       })
+       mux.HandleFunc("/compress", func(w http.ResponseWriter, r 
*http.Request) {
+               handleTokenEvent(stats, w, r, "compress")
+       })
+       mux.HandleFunc("/evict", func(w http.ResponseWriter, r *http.Request) {
+               handleTokenEvent(stats, w, r, "evict")
+       })
+
+       srv := &http.Server{Addr: addr, Handler: mux, ReadHeaderTimeout: 3 * 
time.Second}
+       log.Printf("[mock-controller] listening on %s", addr)
+       if err := srv.ListenAndServe(); err != nil && err != 
http.ErrServerClosed {
+               log.Fatalf("[mock-controller] server failed: %v", err)
+       }
+}
+
+func handleTokenEvent(stats *controllerStats, w http.ResponseWriter, r 
*http.Request, op string) {
+       if r.Method != http.MethodPost {
+               writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+               return
+       }
+       var req tokensRequest
+       if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+               writeJSON(w, http.StatusBadRequest, map[string]string{"error": 
"invalid request"})
+               return
+       }
+
+       stats.mu.Lock()
+       switch op {
+       case "pin":
+               stats.pinCalls++
+       case "compress":
+               stats.compressCalls++
+       case "evict":
+               stats.evictCalls++
+       }
+       stats.mu.Unlock()
+
+       writeJSON(w, http.StatusOK, eventResp{EventID: nextEventID(op), 
NumTokens: len(req.Tokens)})
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+       w.Header().Set("Content-Type", "application/json")
+       w.WriteHeader(status)
+       _ = json.NewEncoder(w).Encode(payload)
+}
+
+func nextEventID(prefix string) string {
+       n := atomic.AddUint64(&eventCounter, 1)
+       return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func envOrDefault(key string, fallback string) string {
+       val, ok := os.LookupEnv(key)
+       if !ok || strings.TrimSpace(val) == "" {
+               return fallback
+       }
+       return val
+}
+
+func max(a int, b int) int {
+       if a > b {
+               return a
+       }
+       return b
+}
diff --git a/ai/kvcache/mock/server/engine-a/main.go 
b/ai/kvcache/mock/server/engine-a/main.go
new file mode 100644
index 0000000..bb24e05
--- /dev/null
+++ b/ai/kvcache/mock/server/engine-a/main.go
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package main
+
+import (
+       "encoding/json"
+       "fmt"
+       "log"
+       "net/http"
+       "os"
+       "strconv"
+       "strings"
+       "sync"
+       "sync/atomic"
+       "time"
+)
+
+type engineStats struct {
+       mu sync.Mutex
+
+       tokenizeCalls int
+       chatCalls     int
+}
+
+type llmRequest struct {
+       Model string `json:"model"`
+}
+
+type llmResponse struct {
+       ID       string         `json:"id"`
+       Object   string         `json:"object"`
+       Model    string         `json:"model"`
+       ServedBy string         `json:"served_by"`
+       Choices  []llmChoice    `json:"choices"`
+       Usage    map[string]int `json:"usage"`
+}
+
+type llmChoice struct {
+       Index   int        `json:"index"`
+       Message llmMessage `json:"message"`
+}
+
+type llmMessage struct {
+       Role    string `json:"role"`
+       Content string `json:"content"`
+}
+
+type tokenizeResponse struct {
+       Count  int   `json:"count"`
+       Tokens []int `json:"tokens"`
+       MaxLen int   `json:"max_model_len"`
+}
+
+var eventCounter uint64
+
+func main() {
+       addr := envOrDefault("LLM_A_ADDR", ":18091")
+       engineID := envOrDefault("LLM_A_ID", "mock-llm-a")
+       stats := &engineStats{}
+
+       mux := http.NewServeMux()
+       mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true, 
"engine": engineID, "tokenize_enabled": true})
+       })
+       mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+               stats.mu.Lock()
+               defer stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "tokenize_calls":       stats.tokenizeCalls,
+                       "chat_calls":           stats.chatCalls,
+                       "engine_id":            engineID,
+                       "timestamp_unix_milli": time.Now().UnixMilli(),
+               })
+       })
+       mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               stats.mu.Lock()
+               stats.tokenizeCalls = 0
+               stats.chatCalls = 0
+               stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+       })
+       mux.HandleFunc("/tokenize", func(w http.ResponseWriter, r 
*http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               prompt := extractPrompt(r)
+               tokens := tokenizePrompt(prompt)
+
+               stats.mu.Lock()
+               stats.tokenizeCalls++
+               stats.mu.Unlock()
+
+               writeJSON(w, http.StatusOK, tokenizeResponse{Count: 
len(tokens), Tokens: tokens, MaxLen: 8192})
+       })
+       mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r 
*http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+
+               var req llmRequest
+               _ = json.NewDecoder(r.Body).Decode(&req)
+               if req.Model == "" {
+                       req.Model = "mock-model"
+               }
+
+               stats.mu.Lock()
+               stats.chatCalls++
+               stats.mu.Unlock()
+
+               resp := llmResponse{
+                       ID:       nextEventID("chatcmpl"),
+                       Object:   "chat.completion",
+                       Model:    req.Model,
+                       ServedBy: engineID,
+                       Choices: []llmChoice{{
+                               Index: 0,
+                               Message: llmMessage{
+                                       Role:    "assistant",
+                                       Content: fmt.Sprintf("mock response 
from %s", engineID),
+                               },
+                       }},
+                       Usage: map[string]int{"prompt_tokens": 8, 
"completion_tokens": 8, "total_tokens": 16},
+               }
+               writeJSON(w, http.StatusOK, resp)
+       })
+
+       srv := &http.Server{Addr: addr, Handler: mux, ReadHeaderTimeout: 3 * 
time.Second}
+       log.Printf("[mock-engine-a] listening on %s", addr)
+       if err := srv.ListenAndServe(); err != nil && err != 
http.ErrServerClosed {
+               log.Fatalf("[mock-engine-a] server failed: %v", err)
+       }
+}
+
+func extractPrompt(r *http.Request) string {
+       if r == nil || r.Body == nil {
+               return ""
+       }
+       var payload map[string]any
+       if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
+               return ""
+       }
+       if prompt, ok := payload["prompt"]; ok {
+               switch v := prompt.(type) {
+               case string:
+                       return strings.TrimSpace(v)
+               case []any:
+                       parts := make([]string, 0, len(v))
+                       for _, item := range v {
+                               if str, ok := item.(string); ok {
+                                       parts = append(parts, str)
+                               }
+                       }
+                       return strings.Join(parts, "\n")
+               }
+       }
+       messages, ok := payload["messages"].([]any)
+       if !ok {
+               return ""
+       }
+       parts := make([]string, 0, len(messages))
+       for _, item := range messages {
+               msgMap, ok := item.(map[string]any)
+               if !ok {
+                       continue
+               }
+               if content, ok := msgMap["content"].(string); ok {
+                       parts = append(parts, content)
+               }
+       }
+       return strings.Join(parts, "\n")
+}
+
+func tokenizePrompt(prompt string) []int {
+       prompt = strings.TrimSpace(prompt)
+       if prompt == "" {
+               return []int{0}
+       }
+       words := strings.Fields(prompt)
+       if len(words) == 0 {
+               return []int{0}
+       }
+       tokens := make([]int, 0, len(words))
+       for idx, word := range words {
+               sum := 0
+               for _, r := range word {
+                       sum += int(r)
+               }
+               tokens = append(tokens, (sum%997)+idx+1)
+       }
+       return tokens
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+       w.Header().Set("Content-Type", "application/json")
+       w.WriteHeader(status)
+       _ = json.NewEncoder(w).Encode(payload)
+}
+
+func nextEventID(prefix string) string {
+       n := atomic.AddUint64(&eventCounter, 1)
+       return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func envOrDefault(key string, fallback string) string {
+       val, ok := os.LookupEnv(key)
+       if !ok || strings.TrimSpace(val) == "" {
+               return fallback
+       }
+       return val
+}
diff --git a/ai/kvcache/mock/server/engine-b/main.go 
b/ai/kvcache/mock/server/engine-b/main.go
new file mode 100644
index 0000000..c5ebcab
--- /dev/null
+++ b/ai/kvcache/mock/server/engine-b/main.go
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package main
+
+import (
+       "encoding/json"
+       "fmt"
+       "log"
+       "net/http"
+       "os"
+       "strconv"
+       "strings"
+       "sync"
+       "sync/atomic"
+       "time"
+)
+
+type engineStats struct {
+       mu sync.Mutex
+
+       chatCalls int
+}
+
+type llmRequest struct {
+       Model string `json:"model"`
+}
+
+type llmResponse struct {
+       ID       string         `json:"id"`
+       Object   string         `json:"object"`
+       Model    string         `json:"model"`
+       ServedBy string         `json:"served_by"`
+       Choices  []llmChoice    `json:"choices"`
+       Usage    map[string]int `json:"usage"`
+}
+
+type llmChoice struct {
+       Index   int        `json:"index"`
+       Message llmMessage `json:"message"`
+}
+
+type llmMessage struct {
+       Role    string `json:"role"`
+       Content string `json:"content"`
+}
+
+var eventCounter uint64
+
+func main() {
+       addr := envOrDefault("LLM_B_ADDR", ":18092")
+       engineID := envOrDefault("LLM_B_ID", "mock-llm-b")
+       stats := &engineStats{}
+
+       mux := http.NewServeMux()
+       mux.HandleFunc("/health", func(w http.ResponseWriter, _ *http.Request) {
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true, 
"engine": engineID, "tokenize_enabled": false})
+       })
+       mux.HandleFunc("/stats", func(w http.ResponseWriter, _ *http.Request) {
+               stats.mu.Lock()
+               defer stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{
+                       "chat_calls":           stats.chatCalls,
+                       "engine_id":            engineID,
+                       "timestamp_unix_milli": time.Now().UnixMilli(),
+               })
+       })
+       mux.HandleFunc("/reset", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+               stats.mu.Lock()
+               stats.chatCalls = 0
+               stats.mu.Unlock()
+               writeJSON(w, http.StatusOK, map[string]any{"ok": true})
+       })
+       mux.HandleFunc("/tokenize", func(w http.ResponseWriter, _ 
*http.Request) {
+               writeJSON(w, http.StatusNotFound, map[string]string{"error": 
"tokenize not available on this instance"})
+       })
+       mux.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r 
*http.Request) {
+               if r.Method != http.MethodPost {
+                       writeJSON(w, http.StatusMethodNotAllowed, 
map[string]string{"error": "method not allowed"})
+                       return
+               }
+
+               var req llmRequest
+               _ = json.NewDecoder(r.Body).Decode(&req)
+               if req.Model == "" {
+                       req.Model = "mock-model"
+               }
+
+               stats.mu.Lock()
+               stats.chatCalls++
+               stats.mu.Unlock()
+
+               resp := llmResponse{
+                       ID:       nextEventID("chatcmpl"),
+                       Object:   "chat.completion",
+                       Model:    req.Model,
+                       ServedBy: engineID,
+                       Choices: []llmChoice{{
+                               Index: 0,
+                               Message: llmMessage{
+                                       Role:    "assistant",
+                                       Content: fmt.Sprintf("mock response 
from %s", engineID),
+                               },
+                       }},
+                       Usage: map[string]int{"prompt_tokens": 8, 
"completion_tokens": 8, "total_tokens": 16},
+               }
+               writeJSON(w, http.StatusOK, resp)
+       })
+
+       srv := &http.Server{Addr: addr, Handler: mux, ReadHeaderTimeout: 3 * 
time.Second}
+       log.Printf("[mock-engine-b] listening on %s", addr)
+       if err := srv.ListenAndServe(); err != nil && err != 
http.ErrServerClosed {
+               log.Fatalf("[mock-engine-b] server failed: %v", err)
+       }
+}
+
+func writeJSON(w http.ResponseWriter, status int, payload any) {
+       w.Header().Set("Content-Type", "application/json")
+       w.WriteHeader(status)
+       _ = json.NewEncoder(w).Encode(payload)
+}
+
+func nextEventID(prefix string) string {
+       n := atomic.AddUint64(&eventCounter, 1)
+       return prefix + "-" + strconv.FormatUint(n, 10)
+}
+
+func envOrDefault(key string, fallback string) string {
+       val, ok := os.LookupEnv(key)
+       if !ok || strings.TrimSpace(val) == "" {
+               return fallback
+       }
+       return val
+}
diff --git a/ai/kvcache/mock/test/pixiu_test.go 
b/ai/kvcache/mock/test/pixiu_test.go
new file mode 100644
index 0000000..8bfba6a
--- /dev/null
+++ b/ai/kvcache/mock/test/pixiu_test.go
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package test
+
+import (
+       "bytes"
+       "encoding/json"
+       "io"
+       "net/http"
+       "os"
+       "testing"
+       "time"
+)
+
+const (
+       defaultPixiuURL      = "http://127.0.0.1:18888";
+       defaultControllerURL = "http://127.0.0.1:18081";
+       defaultEngineAURL    = "http://127.0.0.1:18091";
+       defaultEngineBURL    = "http://127.0.0.1:18092";
+)
+
+var testHTTPClient = &http.Client{Timeout: 3 * time.Second}
+
+func getEnvOrDefault(key string, fallback string) string {
+       if v := os.Getenv(key); v != "" {
+               return v
+       }
+       return fallback
+}
+
+func checkServiceAvailable(url string) bool {
+       resp, err := testHTTPClient.Get(url)
+       if err != nil {
+               return false
+       }
+       defer resp.Body.Close()
+       return resp.StatusCode >= 200 && resp.StatusCode < 300
+}
+
+func checkPixiuAvailable(url string) bool {
+       payload := map[string]any{
+               "model":  "mock-model",
+               "prompt": "kvcache availability probe",
+       }
+       data, err := json.Marshal(payload)
+       if err != nil {
+               return false
+       }
+       req, err := http.NewRequest(http.MethodPost, 
url+"/v1/chat/completions", bytes.NewReader(data))
+       if err != nil {
+               return false
+       }
+       req.Header.Set("Content-Type", "application/json")
+       resp, err := testHTTPClient.Do(req)
+       if err != nil {
+               return false
+       }
+       defer resp.Body.Close()
+       return resp.StatusCode >= 200 && resp.StatusCode < 600
+}
+
+func postJSON(t *testing.T, url string, payload any) map[string]any {
+       t.Helper()
+       data, err := json.Marshal(payload)
+       if err != nil {
+               t.Fatalf("marshal payload failed: %v", err)
+       }
+
+       req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data))
+       if err != nil {
+               t.Fatalf("create request failed: %v", err)
+       }
+       req.Header.Set("Content-Type", "application/json")
+
+       resp, err := testHTTPClient.Do(req)
+       if err != nil {
+               t.Fatalf("request failed: %v", err)
+       }
+       defer resp.Body.Close()
+       body, _ := io.ReadAll(resp.Body)
+       if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+               t.Fatalf("request status %d: %s", resp.StatusCode, string(body))
+       }
+
+       var out map[string]any
+       if err := json.Unmarshal(body, &out); err != nil {
+               t.Fatalf("unmarshal response failed: %v body=%s", err, 
string(body))
+       }
+       return out
+}
+
+func getJSON(t *testing.T, url string) map[string]any {
+       t.Helper()
+       resp, err := testHTTPClient.Get(url)
+       if err != nil {
+               t.Fatalf("get request failed: %v", err)
+       }
+       defer resp.Body.Close()
+       body, _ := io.ReadAll(resp.Body)
+       if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+               t.Fatalf("get status %d: %s", resp.StatusCode, string(body))
+       }
+
+       var out map[string]any
+       if err := json.Unmarshal(body, &out); err != nil {
+               t.Fatalf("unmarshal response failed: %v body=%s", err, 
string(body))
+       }
+       return out
+}
+
+func toInt(t *testing.T, value any, key string) int {
+       t.Helper()
+       num, ok := value.(float64)
+       if !ok {
+               t.Fatalf("%s is not numeric: %#v", key, value)
+       }
+       return int(num)
+}
+
+func TestKVCacheMockRoutingFlow(t *testing.T) {
+       pixiuURL := getEnvOrDefault("PIXIU_URL", defaultPixiuURL)
+       controllerURL := getEnvOrDefault("CONTROLLER_URL", defaultControllerURL)
+       engineAURL := getEnvOrDefault("ENGINE_A_URL", defaultEngineAURL)
+       engineBURL := getEnvOrDefault("ENGINE_B_URL", defaultEngineBURL)
+
+       if !checkServiceAvailable(controllerURL+"/health") ||
+               !checkServiceAvailable(engineAURL+"/health") ||
+               !checkServiceAvailable(engineBURL+"/health") ||
+               !checkPixiuAvailable(pixiuURL) {
+               t.Skip("required services are unavailable; start mock servers 
and pixiu first")
+       }
+
+       postJSON(t, controllerURL+"/reset", map[string]any{})
+       postJSON(t, engineAURL+"/reset", map[string]any{})
+       postJSON(t, engineBURL+"/reset", map[string]any{})
+
+       payload := map[string]any{
+               "model": "mock-model",
+               "messages": []map[string]any{{
+                       "role":    "user",
+                       "content": "please route same prompt for kvcache test",
+               }},
+       }
+
+       postJSON(t, pixiuURL+"/v1/chat/completions", payload)
+       time.Sleep(600 * time.Millisecond)
+       second := postJSON(t, pixiuURL+"/v1/chat/completions", payload)
+
+       servedBy, _ := second["served_by"].(string)
+       if servedBy != "mock-llm-b" {
+               t.Fatalf("expected served_by mock-llm-b, got %q", servedBy)
+       }
+
+       time.Sleep(1 * time.Second)
+       controllerStats := getJSON(t, controllerURL+"/stats")
+       engineAStats := getJSON(t, engineAURL+"/stats")
+       engineBStats := getJSON(t, engineBURL+"/stats")
+
+       if got := toInt(t, engineAStats["tokenize_calls"], "tokenize_calls"); 
got < 1 {
+               t.Fatalf("expected tokenize_calls >= 1, got %d", got)
+       }
+       if got := toInt(t, controllerStats["lookup_calls"], "lookup_calls"); 
got < 2 {
+               t.Fatalf("expected lookup_calls >= 2, got %d", got)
+       }
+       if got := toInt(t, controllerStats["pin_calls"], "pin_calls"); got < 1 {
+               t.Fatalf("expected pin_calls >= 1, got %d", got)
+       }
+       if got := toInt(t, engineBStats["chat_calls"], "chat_calls"); got < 1 {
+               t.Fatalf("expected engine-b chat_calls >= 1, got %d", got)
+       }
+}
diff --git a/ai/kvcache/real-engine/README.md b/ai/kvcache/real-engine/README.md
new file mode 100644
index 0000000..da15ac4
--- /dev/null
+++ b/ai/kvcache/real-engine/README.md
@@ -0,0 +1,45 @@
+# KVCache BYOE (Real Engine) Sample
+
+[Back to KVCache Index](../README.md) | English | [中文](./README_zh.md)
+
+## Layout
+
+- `pixiu/conf.yaml`: BYOE Pixiu template (env-driven)
+- `request.sh`: manual request script
+- `test/pixiu_test.go`: smoke test (requires external BYOE env)
+- `verify.sh`: verification script for metrics-oriented validation
+
+## Prerequisites
+
+- reachable `VLLM_ENDPOINT` with `/tokenize`
+- reachable `LMCACHE_ENDPOINT` with `/lookup` `/pin` `/compress` `/evict`
+- Pixiu binary or Pixiu source repo
+
+## Quick Start (CLI)
+
+1. Configure env and run verification:
+
+```bash
+cd ai/kvcache/real-engine
+export VLLM_ENDPOINT="http://<vllm-host>:<port>"
+export LMCACHE_ENDPOINT="http://<lmcache-host>:<port>"
+./verify.sh
+```
+
+2. Send manual request:
+
+```bash
+./request.sh
+```
+
+3. Run test case:
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## Acceptance Metrics
+
+- lookup success rate
+- preferred endpoint hit rate
+- p95 latency comparison (without cache vs with cache)
diff --git a/ai/kvcache/real-engine/README_zh.md 
b/ai/kvcache/real-engine/README_zh.md
new file mode 100644
index 0000000..8d82d3f
--- /dev/null
+++ b/ai/kvcache/real-engine/README_zh.md
@@ -0,0 +1,45 @@
+# KVCache BYOE（真实引擎）示例
+
+[返回 KVCache 索引](../README_zh.md) | [English](./README.md) | 中文
+
+## 目录结构
+
+- `pixiu/conf.yaml`: BYOE Pixiu 模板（环境变量驱动）
+- `request.sh`: 手工请求脚本
+- `test/pixiu_test.go`: 冒烟测试（依赖外部 BYOE 环境）
+- `verify.sh`: 指标验证脚本
+
+## 前置条件
+
+- 可访问的 `VLLM_ENDPOINT`（包含 `/tokenize`）
+- 可访问的 `LMCACHE_ENDPOINT`（包含 `/lookup` `/pin` `/compress` `/evict`）
+- 可用 Pixiu 二进制或 Pixiu 源码仓库
+
+## 命令行快速开始
+
+1. 配置环境变量并执行验证：
+
+```bash
+cd ai/kvcache/real-engine
+export VLLM_ENDPOINT="http://<vllm-host>:<port>"
+export LMCACHE_ENDPOINT="http://<lmcache-host>:<port>"
+./verify.sh
+```
+
+2. 手工发请求：
+
+```bash
+./request.sh
+```
+
+3. 运行测试用例：
+
+```bash
+go test -v ./test/pixiu_test.go
+```
+
+## 验收指标
+
+- lookup 成功率
+- preferred endpoint 命中率
+- p95 延迟对比（无缓存 vs 有缓存）
diff --git a/ai/kvcache/real-engine/pixiu/conf.yaml 
b/ai/kvcache/real-engine/pixiu/conf.yaml
new file mode 100644
index 0000000..4c5e56c
--- /dev/null
+++ b/ai/kvcache/real-engine/pixiu/conf.yaml
@@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+---
+# This file is intended to be rendered by verify.sh (envsubst).
+static_resources:
+  listeners:
+    - name: "kvcache_real_engine"
+      protocol_type: "HTTP"
+      address:
+        socket_address:
+          address: "0.0.0.0"
+          port: ${PIXIU_LISTEN_PORT}
+      filter_chains:
+        filters:
+          - name: dgp.filter.httpconnectionmanager
+            config:
+              route_config:
+                routes:
+                  - match:
+                      prefix: "/v1/chat/completions"
+                    route:
+                      cluster: "real_llm"
+                      cluster_not_found_response_code: 505
+              http_filters:
+                - name: dgp.filter.ai.kvcache
+                  config:
+                    enabled: true
+                    vllm_endpoint: "${VLLM_ENDPOINT}"
+                    lmcache_endpoint: "${LMCACHE_ENDPOINT}"
+                    default_model: "${MODEL_NAME}"
+                    request_timeout: 5s
+                    lookup_routing_timeout: 150ms
+                    hot_window: 5m
+                    hot_max_records: 500
+                    token_cache:
+                      enabled: true
+                      max_size: 20000
+                      ttl: 20m
+                    cache_strategy:
+                      enable_compression: true
+                      enable_pinning: true
+                      enable_eviction: true
+                      memory_threshold: ${MEMORY_THRESHOLD}
+                      hot_content_threshold: ${HOT_CONTENT_THRESHOLD}
+                      load_threshold: ${LOAD_THRESHOLD}
+                      pin_instance_id: "${PIN_INSTANCE_ID}"
+                      pin_location: "${PIN_LOCATION}"
+                      compress_instance_id: "${COMPRESS_INSTANCE_ID}"
+                      compress_location: "${COMPRESS_LOCATION}"
+                      compress_method: "zstd"
+                      evict_instance_id: "${EVICT_INSTANCE_ID}"
+                - name: dgp.filter.llm.proxy
+                  config:
+                    timeout: 90s
+                    maxIdleConns: 200
+                    maxIdleConnsPerHost: 200
+                    maxConnsPerHost: 200
+                    scheme: "${LLM_SCHEME}"
+      config:
+        idle_timeout: 120s
+        read_timeout: 120s
+        write_timeout: 120s
+
+  clusters:
+    - name: "real_llm"
+      lb_policy: "round_robin"
+      endpoints:
+        - id: "${ENGINE_ENDPOINT_A_ID}"
+          socket_address:
+            address: "${ENGINE_ENDPOINT_A_HOST}"
+            port: ${ENGINE_ENDPOINT_A_PORT}
+        - id: "${ENGINE_ENDPOINT_B_ID}"
+          socket_address:
+            address: "${ENGINE_ENDPOINT_B_HOST}"
+            port: ${ENGINE_ENDPOINT_B_PORT}
+
+  shutdown_config:
+    timeout: "30s"
+    step_timeout: "5s"
+    reject_policy: "immediacy"
+
+metric:
+  enable: true
+  prometheus_port: ${PIXIU_PROM_PORT}
diff --git a/ai/kvcache/real-engine/request.sh 
b/ai/kvcache/real-engine/request.sh
new file mode 100755
index 0000000..6b827c8
--- /dev/null
+++ b/ai/kvcache/real-engine/request.sh
@@ -0,0 +1,37 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:18889}";
+MODEL_NAME="${MODEL_NAME:-Qwen2.5-3B-Instruct}"
+
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+body=$(cat <<JSON
+{"model":"${MODEL_NAME}","messages":[{"role":"user","content":"kvcache byoe 
verification request"}]}
+JSON
+)
+
+curl -sS -H 'Content-Type: application/json' \
+  -X POST "${PIXIU_URL}/v1/chat/completions" \
+  -d "${body}"
+echo ""
diff --git a/ai/kvcache/real-engine/test/pixiu_test.go 
b/ai/kvcache/real-engine/test/pixiu_test.go
new file mode 100644
index 0000000..8b794b8
--- /dev/null
+++ b/ai/kvcache/real-engine/test/pixiu_test.go
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test
+
+import (
+       "bytes"
+       "encoding/json"
+       "io"
+       "net/http"
+       "os"
+       "testing"
+       "time"
+)
+
+const defaultBYOEPixiuURL = "http://127.0.0.1:18889";
+
+var testHTTPClient = &http.Client{Timeout: 3 * time.Second}
+
+func envOrDefault(key string, fallback string) string {
+       if v := os.Getenv(key); v != "" {
+               return v
+       }
+       return fallback
+}
+
+func checkPixiuAvailable(url string) bool {
+       payload := map[string]any{
+               "model":  envOrDefault("MODEL_NAME", "Qwen2.5-3B-Instruct"),
+               "prompt": "kvcache byoe availability probe",
+       }
+       data, err := json.Marshal(payload)
+       if err != nil {
+               return false
+       }
+       req, err := http.NewRequest(http.MethodPost, 
url+"/v1/chat/completions", bytes.NewReader(data))
+       if err != nil {
+               return false
+       }
+       req.Header.Set("Content-Type", "application/json")
+       resp, err := testHTTPClient.Do(req)
+       if err != nil {
+               return false
+       }
+       defer resp.Body.Close()
+       return resp.StatusCode >= 200 && resp.StatusCode < 600
+}
+
+func TestBYOEEnvironmentAndGatewayAvailability(t *testing.T) {
+       if os.Getenv("VLLM_ENDPOINT") == "" || os.Getenv("LMCACHE_ENDPOINT") == 
"" {
+               t.Skip("VLLM_ENDPOINT/LMCACHE_ENDPOINT not set; BYOE 
environment not configured")
+       }
+
+       pixiuURL := envOrDefault("PIXIU_URL", defaultBYOEPixiuURL)
+       if !checkPixiuAvailable(pixiuURL) {
+               t.Skip("pixiu gateway is unavailable; start pixiu with 
ai/kvcache/real-engine/pixiu/conf.yaml first")
+       }
+}
+
+func TestBYOERequestSmoke(t *testing.T) {
+       if os.Getenv("VLLM_ENDPOINT") == "" || os.Getenv("LMCACHE_ENDPOINT") == 
"" {
+               t.Skip("VLLM_ENDPOINT/LMCACHE_ENDPOINT not set; BYOE 
environment not configured")
+       }
+
+       pixiuURL := envOrDefault("PIXIU_URL", defaultBYOEPixiuURL)
+       if !checkPixiuAvailable(pixiuURL) {
+               t.Skip("pixiu gateway is unavailable; start pixiu first")
+       }
+
+       payload := map[string]any{
+               "model": envOrDefault("MODEL_NAME", "Qwen2.5-3B-Instruct"),
+               "messages": []map[string]any{{
+                       "role":    "user",
+                       "content": "kvcache byoe smoke test",
+               }},
+       }
+       data, err := json.Marshal(payload)
+       if err != nil {
+               t.Fatalf("marshal payload failed: %v", err)
+       }
+       req, err := http.NewRequest(http.MethodPost, 
pixiuURL+"/v1/chat/completions", bytes.NewReader(data))
+       if err != nil {
+               t.Fatalf("create request failed: %v", err)
+       }
+       req.Header.Set("Content-Type", "application/json")
+
+       resp, err := testHTTPClient.Do(req)
+       if err != nil {
+               t.Fatalf("request failed: %v", err)
+       }
+       defer resp.Body.Close()
+       body, _ := io.ReadAll(resp.Body)
+       if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+               t.Fatalf("unexpected status %d: %s", resp.StatusCode, 
string(body))
+       }
+}
diff --git a/ai/kvcache/real-engine/verify.sh b/ai/kvcache/real-engine/verify.sh
new file mode 100755
index 0000000..0ff0f38
--- /dev/null
+++ b/ai/kvcache/real-engine/verify.sh
@@ -0,0 +1,372 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PIXIU_SOURCE="${PIXIU_SOURCE:-}"
+START_PIXIU="${START_PIXIU:-1}"
+GO_CACHE_DIR="${GO_CACHE_DIR:-/tmp/go-build-cache}"
+GO_MOD_CACHE_DIR="${GO_MOD_CACHE_DIR:-/tmp/go-mod-cache}"
+KEEP_WORK_DIR="${KEEP_WORK_DIR:-0}"
+export NO_PROXY="${NO_PROXY:-127.0.0.1,localhost}"
+export no_proxy="${no_proxy:-127.0.0.1,localhost}"
+
+required_cmds=(curl jq envsubst awk sort)
+for cmd in "${required_cmds[@]}"; do
+  if ! command -v "${cmd}" >/dev/null 2>&1; then
+    echo "missing required command: ${cmd}"
+    exit 1
+  fi
+done
+
+require_env() {
+  local key="$1"
+  if [[ -z "${!key:-}" ]]; then
+    echo "missing required env: ${key}"
+    exit 1
+  fi
+}
+
+parse_url_scheme() {
+  local url="$1"
+  if [[ "${url}" == https://* ]]; then
+    echo "https"
+  else
+    echo "http"
+  fi
+}
+
+parse_url_host() {
+  local url="$1"
+  local tmp="${url#*://}"
+  tmp="${tmp%%/*}"
+  echo "${tmp%%:*}"
+}
+
+parse_url_port() {
+  local url="$1"
+  local tmp="${url#*://}"
+  tmp="${tmp%%/*}"
+  if [[ "${tmp}" == *:* ]]; then
+    echo "${tmp##*:}"
+    return
+  fi
+  if [[ "${url}" == https://* ]]; then
+    echo "443"
+  else
+    echo "80"
+  fi
+}
+
+require_env VLLM_ENDPOINT
+require_env LMCACHE_ENDPOINT
+
+export PIXIU_LISTEN_PORT="${PIXIU_LISTEN_PORT:-18889}"
+export PIXIU_PROM_PORT="${PIXIU_PROM_PORT:-2223}"
+export MODEL_NAME="${MODEL_NAME:-Qwen2.5-3B-Instruct}"
+export HOT_CONTENT_THRESHOLD="${HOT_CONTENT_THRESHOLD:-3}"
+export LOAD_THRESHOLD="${LOAD_THRESHOLD:-0.7}"
+export MEMORY_THRESHOLD="${MEMORY_THRESHOLD:-0.8}"
+export LLM_SCHEME="${LLM_SCHEME:-$(parse_url_scheme "${VLLM_ENDPOINT}")}" # 
override to https if needed
+
+export ENGINE_ENDPOINT_A_ID="${ENGINE_ENDPOINT_A_ID:-engine-a}"
+export ENGINE_ENDPOINT_B_ID="${ENGINE_ENDPOINT_B_ID:-engine-b}"
+export ENGINE_ENDPOINT_A_HOST="${ENGINE_ENDPOINT_A_HOST:-$(parse_url_host 
"${VLLM_ENDPOINT}")}" 
+export ENGINE_ENDPOINT_A_PORT="${ENGINE_ENDPOINT_A_PORT:-$(parse_url_port 
"${VLLM_ENDPOINT}")}" 
+export 
ENGINE_ENDPOINT_B_HOST="${ENGINE_ENDPOINT_B_HOST:-${ENGINE_ENDPOINT_A_HOST}}"
+export 
ENGINE_ENDPOINT_B_PORT="${ENGINE_ENDPOINT_B_PORT:-${ENGINE_ENDPOINT_A_PORT}}"
+
+export PIN_INSTANCE_ID="${PIN_INSTANCE_ID:-${ENGINE_ENDPOINT_A_ID}}"
+export PIN_LOCATION="${PIN_LOCATION:-lmcache}"
+export COMPRESS_INSTANCE_ID="${COMPRESS_INSTANCE_ID:-${ENGINE_ENDPOINT_B_ID}}"
+export COMPRESS_LOCATION="${COMPRESS_LOCATION:-lmcache}"
+export EVICT_INSTANCE_ID="${EVICT_INSTANCE_ID:-${ENGINE_ENDPOINT_A_ID}}"
+
+PIXIU_URL="${PIXIU_URL:-http://127.0.0.1:${PIXIU_LISTEN_PORT}}";
+PROM_URL="${PROM_URL:-http://127.0.0.1:${PIXIU_PROM_PORT}/metrics}";
+
+WORK_DIR="$(mktemp -d /tmp/kvcache-real-verify-XXXXXX)"
+RENDERED_CONFIG="${WORK_DIR}/pixiu.rendered.yaml"
+PIXIU_LOG="${WORK_DIR}/pixiu.log"
+
+BASELINE_ROUNDS="${BASELINE_ROUNDS:-12}"
+CACHED_ROUNDS="${CACHED_ROUNDS:-12}"
+BASELINE_FILE="${WORK_DIR}/baseline.times"
+CACHED_FILE="${WORK_DIR}/cached.times"
+
+PIXIU_PID=""
+
+cleanup() {
+  set +e
+  local exit_code=$?
+  if [[ -n "${PIXIU_PID}" ]] && kill -0 "${PIXIU_PID}" >/dev/null 2>&1; then
+    kill "${PIXIU_PID}" >/dev/null 2>&1
+    wait "${PIXIU_PID}" >/dev/null 2>&1
+  fi
+  if [[ "${KEEP_WORK_DIR}" != "1" && "${exit_code}" -eq 0 && -n 
"${WORK_DIR:-}" && -d "${WORK_DIR}" ]]; then
+    rm -rf "${WORK_DIR}"
+  fi
+  return "${exit_code}"
+}
+trap cleanup EXIT INT TERM
+
+wait_for_pixiu() {
+  local body
+  body="$(jq -nc --arg model "${MODEL_NAME}" '{model:$model,prompt:"kvcache 
smoke"}')"
+  for _ in $(seq 1 100); do
+    local code
+    code="$(curl -s -o /dev/null -w '%{http_code}' \
+      -H 'Content-Type: application/json' \
+      -X POST "${PIXIU_URL}/v1/chat/completions" \
+      -d "${body}" || true)"
+    if [[ "${code}" == "200" || "${code}" == "4"* || "${code}" == "5"* ]]; then
+      return 0
+    fi
+    sleep 0.3
+  done
+  return 1
+}
+
+start_pixiu() {
+  if command -v pixiu >/dev/null 2>&1; then
+    pixiu gateway start -c "${RENDERED_CONFIG}" >"${PIXIU_LOG}" 2>&1 &
+    PIXIU_PID="$!"
+  elif [[ -d "${PIXIU_SOURCE}/cmd/pixiu" ]]; then
+    (
+      cd "${PIXIU_SOURCE}"
+      env GOCACHE="${GO_CACHE_DIR}" GOMODCACHE="${GO_MOD_CACHE_DIR}" go run 
./cmd/pixiu/*.go gateway start -c "${RENDERED_CONFIG}"
+    ) >"${PIXIU_LOG}" 2>&1 &
+    PIXIU_PID="$!"
+  else
+    echo "cannot find pixiu binary or source. set PIXIU_SOURCE or install 
pixiu in PATH"
+    exit 1
+  fi
+
+  if ! wait_for_pixiu; then
+    echo "pixiu not ready, log: ${PIXIU_LOG}"
+    exit 1
+  fi
+}
+
+render_config() {
+  envsubst <"${SCRIPT_DIR}/pixiu/conf.yaml" >"${RENDERED_CONFIG}"
+}
+
+p95() {
+  local file="$1"
+  awk 'NF {print $1}' "${file}" | sort -n | awk '
+    {
+      arr[NR] = $1
+    }
+    END {
+      if (NR == 0) {
+        print "0"
+        exit
+      }
+      idx = int((NR * 95 + 99) / 100)
+      if (idx < 1) idx = 1
+      if (idx > NR) idx = NR
+      print arr[idx]
+    }'
+}
+
+avg() {
+  local file="$1"
+  awk '{sum += $1; n += 1} END {if (n == 0) {print "0"} else {printf "%.6f", 
sum / n}}' "${file}"
+}
+
+run_load() {
+  local mode="$1"
+  local rounds="$2"
+  local output_file="$3"
+  local fixed_prompt="kvcache route preference probe"
+
+  : >"${output_file}"
+  for i in $(seq 1 "${rounds}"); do
+    local prompt
+    if [[ "${mode}" == "baseline" ]]; then
+      prompt="${fixed_prompt} baseline-${i}-$(date +%s%N)"
+    else
+      prompt="${fixed_prompt}"
+    fi
+
+    local body
+    body="$(jq -nc --arg model "${MODEL_NAME}" --arg prompt "${prompt}" 
'{model:$model,messages:[{role:"user",content:$prompt}]}')"
+
+    local result
+    local response_file="${WORK_DIR}/${mode}-${i}.response.out"
+    result="$(curl -sS -o "${response_file}" -w '%{http_code} %{time_total}' \
+      -H 'Content-Type: application/json' \
+      -X POST "${PIXIU_URL}/v1/chat/completions" \
+      -d "${body}")"
+
+    local status
+    status="${result%% *}"
+    local timing
+    timing="${result##* }"
+
+    if [[ "${status}" != "200" ]]; then
+      echo "request failed in ${mode} mode: status=${status}"
+      cat "${response_file}"
+      exit 1
+    fi
+
+    echo "${timing}" >>"${output_file}"
+  done
+}
+
+lookup_probe() {
+  local tokenize_body
+  tokenize_body="$(jq -nc --arg model "${MODEL_NAME}" 
'{model:$model,prompt:"kvcache lookup probe"}')"
+
+  local tokenize_resp_file="${WORK_DIR}/lookup-probe-tokenize.response.json"
+  local tokenize_status
+  tokenize_status="$(curl -sS -o "${tokenize_resp_file}" -w '%{http_code}' \
+    -H 'Content-Type: application/json' \
+    -X POST "${VLLM_ENDPOINT}/tokenize" \
+    -d "${tokenize_body}")"
+  if [[ "${tokenize_status}" != "200" ]]; then
+    echo "lookup_probe_error: tokenize returned HTTP ${tokenize_status}"
+    cat "${tokenize_resp_file}"
+    exit 1
+  fi
+
+  local tokens_json
+  tokens_json="$(jq -c 'if type == "object" then (.tokens // []) else [] end' 
"${tokenize_resp_file}" 2>/dev/null || echo '[]')"
+  if [[ "${tokens_json}" == "[]" ]]; then
+    echo "lookup_probe_error: tokenize response did not contain usable tokens"
+    cat "${tokenize_resp_file}"
+    exit 1
+  fi
+
+  local lookup_body
+  lookup_body="$(jq -nc --argjson t "${tokens_json}" '{tokens:$t}')"
+  local lookup_resp_file="${WORK_DIR}/lookup-probe-lookup.response.json"
+  local lookup_status
+  lookup_status="$(curl -sS -o "${lookup_resp_file}" -w '%{http_code}' \
+    -H 'Content-Type: application/json' \
+    -X POST "${LMCACHE_ENDPOINT}/lookup" \
+    -d "${lookup_body}")"
+  if [[ "${lookup_status}" != "200" ]]; then
+    echo "lookup_probe_error: lookup returned HTTP ${lookup_status}"
+    cat "${lookup_resp_file}"
+    exit 1
+  fi
+
+  local preferred
+  preferred="$(jq -r 'if type == "object" then ((.layout_info // {}) | 
to_entries | max_by(.value["1"]) | .key) else empty end // empty' 
"${lookup_resp_file}" 2>/dev/null || true)"
+  if [[ -z "${preferred}" ]]; then
+    echo "lookup_probe_error: cannot parse preferred endpoint from lookup 
response"
+    cat "${lookup_resp_file}"
+    exit 1
+  fi
+
+  echo "${preferred}"
+}
+
+calc_preferred_hit_rate() {
+  local preferred_id="$1"
+  local preferred_addr=""
+
+  if [[ "${preferred_id}" == "${ENGINE_ENDPOINT_A_ID}" ]]; then
+    preferred_addr="${ENGINE_ENDPOINT_A_HOST}:${ENGINE_ENDPOINT_A_PORT}"
+  elif [[ "${preferred_id}" == "${ENGINE_ENDPOINT_B_ID}" ]]; then
+    preferred_addr="${ENGINE_ENDPOINT_B_HOST}:${ENGINE_ENDPOINT_B_PORT}"
+  else
+    echo "0 0 ${preferred_id}"
+    return
+  fi
+
+  local metrics
+  metrics="$(curl -sS "${PROM_URL}" || true)"
+
+  local preferred_hits
+  preferred_hits="$(awk -v addr="${preferred_addr}" '
+    /^pixiu_llm_upstream_requests_total/ && index($0, "endpoint_address=\"" 
addr "\"") > 0 {sum += $NF}
+    END {printf "%.0f", sum}
+  ' <<<"${metrics}")"
+
+  local total_hits
+  total_hits="$(awk -v 
addr_a="${ENGINE_ENDPOINT_A_HOST}:${ENGINE_ENDPOINT_A_PORT}" -v 
addr_b="${ENGINE_ENDPOINT_B_HOST}:${ENGINE_ENDPOINT_B_PORT}" '
+    /^pixiu_llm_upstream_requests_total/ && (index($0, "endpoint_address=\"" 
addr_a "\"") > 0 || index($0, "endpoint_address=\"" addr_b "\"") > 0) {sum += 
$NF}
+    END {printf "%.0f", sum}
+  ' <<<"${metrics}")"
+
+  echo "${preferred_hits} ${total_hits} ${preferred_addr}"
+}
+
+echo "[1/5] rendering pixiu config"
+render_config
+echo "rendered config: ${RENDERED_CONFIG}"
+
+if [[ "${START_PIXIU}" == "1" ]]; then
+  echo "[2/5] starting pixiu"
+  start_pixiu
+else
+  echo "[2/5] using existing pixiu at ${PIXIU_URL}"
+fi
+
+echo "[3/5] probing lookup/preferred endpoint"
+preferred_id="$(lookup_probe)"
+echo "lookup preferred endpoint id: ${preferred_id}"
+
+echo "[4/5] running latency workloads (baseline vs cached)"
+run_load baseline "${BASELINE_ROUNDS}" "${BASELINE_FILE}"
+run_load cached "${CACHED_ROUNDS}" "${CACHED_FILE}"
+
+baseline_p95="$(p95 "${BASELINE_FILE}")"
+cached_p95="$(p95 "${CACHED_FILE}")"
+baseline_avg="$(avg "${BASELINE_FILE}")"
+cached_avg="$(avg "${CACHED_FILE}")"
+
+improvement_pct="$(awk -v b="${baseline_p95}" -v c="${cached_p95}" 'BEGIN { if 
(b <= 0) {print "0.00"} else {printf "%.2f", ((b-c)/b)*100 } }')"
+
+echo "[5/5] computing preferred endpoint hit rate"
+read -r preferred_hits total_hits preferred_addr < <(calc_preferred_hit_rate 
"${preferred_id}")
+hit_rate="$(awk -v p="${preferred_hits}" -v t="${total_hits}" 'BEGIN { if (t 
<= 0) {print "0.00"} else {printf "%.2f", (p/t)*100 } }')"
+
+cat <<REPORT
+
+=== KVCache Real-Engine Verification Report ===
+Pixiu URL: ${PIXIU_URL}
+Prometheus URL: ${PROM_URL}
+Preferred endpoint id (from lookup): ${preferred_id}
+Preferred endpoint addr (mapped): ${preferred_addr}
+
+lookup success: PASS (preferred endpoint parsed)
+preferred endpoint hit: ${preferred_hits}/${total_hits} (${hit_rate}%)
+
+latency baseline avg: ${baseline_avg}s
+latency cached   avg: ${cached_avg}s
+latency baseline p95: ${baseline_p95}s
+latency cached   p95: ${cached_p95}s
+p95 delta (baseline-cached)/baseline: ${improvement_pct}%
+
+rendered config: ${RENDERED_CONFIG}
+REPORT
+
+if [[ "${total_hits}" == "0" ]]; then
+  echo "WARN: no llm upstream metrics observed. verify prometheus endpoint and 
traffic labels."
+fi
+
+if [[ "${START_PIXIU}" == "1" ]]; then
+  echo "pixiu log: ${PIXIU_LOG}"
+fi
diff --git a/start_integrate_test.sh b/start_integrate_test.sh
index 5242f78..eecbeca 100755
--- a/start_integrate_test.sh
+++ b/start_integrate_test.sh
@@ -39,6 +39,8 @@ array=(
   # plugins
   "plugins/opa/embedded"
   "plugins/opa/server-mode"
+  # ai
+  "ai/kvcache/mock"
 )
 
 for t in "${array[@]}"; do

(dubbo-go-pixiu-samples) branch main updated: feat: add kvcache samples (#123)

Reply via email to