(doris-website) branch master updated: [doc] add vector quantization guide for current and 4.x (#3461)

yiguolei Thu, 12 Mar 2026 22:54:22 -0700

This is an automated email from the ASF dual-hosted git repository.

yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new bd609ea43dc [doc] add vector quantization guide for current and 4.x 
(#3461)
bd609ea43dc is described below

commit bd609ea43dc8a304c4965287029328bc5b781e5f
Author: zhiqiang <[email protected]>
AuthorDate: Fri Mar 13 13:53:56 2026 +0800

    [doc] add vector quantization guide for current and 4.x (#3461)
    
    ## Summary
    
    This PR adds a new vector quantization guide for Doris ANN and syncs it
    to both `current` and `4.x` docs.
    
    ### Added docs
    - `docs/ai/vector-search/quantization-survey.md`
    - `versioned_docs/version-4.x/ai/vector-search/quantization-survey.md`
    -
    
`i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md`
    -
    
`i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md`
    
    ### Sidebar updates
    - `sidebars.ts`
    - `versioned_sidebars/version-4.x-sidebars.json`
    
    ### Assets
    - `static/images/vector-search/quantization-survey/*.png`
      - SQ: build time vs rows, memory usage vs rows
    - PQ: index size on disk vs rows, build time vs rows, search time vs
    rows
    
    ## Notes
    
    - Removed RaBitQ content because Doris does not currently support it.
    - Kept the doc in an educational style with practical Doris guidance.
    - Preserved a concise Faiss source-level section with proper Doris/Faiss
    background context.
---
 docs/ai/vector-search/quantization-survey.md       | 212 ++++++++++++++++++++
 .../ai/vector-search/quantization-survey.md        | 213 +++++++++++++++++++++
 .../ai/vector-search/quantization-survey.md        | 213 +++++++++++++++++++++
 sidebars.ts                                        |   1 +
 .../quantization-survey/pq-build-time-vs-rows.png  | Bin 0 -> 58592 bytes
 .../pq-index-size-on-disk-vs-rows.png              | Bin 0 -> 67005 bytes
 .../quantization-survey/pq-search-time-vs-rows.png | Bin 0 -> 74165 bytes
 .../quantization-survey/sq-build-time-vs-rows.png  | Bin 0 -> 42287 bytes
 .../sq-memory-usage-vs-rows.png                    | Bin 0 -> 44927 bytes
 .../ai/vector-search/quantization-survey.md        | 212 ++++++++++++++++++++
 versioned_sidebars/version-4.x-sidebars.json       |   1 +
 11 files changed, 852 insertions(+)

diff --git a/docs/ai/vector-search/quantization-survey.md 
b/docs/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..6a2d032120b
--- /dev/null
+++ b/docs/ai/vector-search/quantization-survey.md
@@ -0,0 +1,212 @@
+---
+{
+    "title": "Vector Quantization Survey and Selection Guide",
+    "language": "en",
+    "description": "A practical survey of SQ, PQ, and related quantization 
methods for Doris ANN, with trade-offs and selection guidance."
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document introduces common vector quantization methods from a practical 
perspective, and explains how to apply them in Apache Doris ANN workloads.
+
+## Why Quantization Is Needed
+
+For ANN workloads, especially HNSW, index memory can quickly become the 
bottleneck. Quantization maps high-precision vectors (usually float32) to 
lower-precision codes, trading a small amount of recall for lower memory usage.
+
+In Doris, quantization is controlled by the `quantizer` property in ANN 
indexes:
+- `flat`: no quantization (highest quality, highest memory)
+- `sq8`: scalar quantization, 8-bit
+- `sq4`: scalar quantization, 4-bit
+- `pq`: product quantization
+
+Example (HNSW + quantizer):
+
+```sql
+CREATE TABLE vector_tbl (
+  id BIGINT,
+  embedding ARRAY<FLOAT>,
+  INDEX ann_idx (embedding) USING ANN PROPERTIES (
+    "index_type" = "hnsw",
+    "metric_type" = "l2_distance",
+    "dim" = "768",
+    "quantizer" = "sq8"
+  )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## Method Overview
+
+| Method | Core Idea | Typical Gain | Main Cost |
+|---|---|---|---|
+| SQ (Scalar Quantization) | Quantize each dimension independently | Large 
memory reduction, simple implementation | Build slower than FLAT; recall drops 
with stronger compression |
+| PQ (Product Quantization) | Split vector into subvectors, quantize each 
subvector with codebooks | Better compression/latency balance on many datasets 
| Training/encoding cost is high; tuning is required |
+
+Apache Doris currently uses an optimized Faiss implementation as the core 
engine for ANN vector indexing and search. The SQ/PQ behavior discussed below 
is therefore directly relevant to Doris in practice.
+
+## Scalar Quantization (SQ)
+
+### Principle
+
+SQ keeps the vector dimension unchanged and only lowers per-dimension 
precision.
+
+A standard min-max mapping per dimension is:
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss SQ has two styles:
+- Uniform: all dimensions share one min/max range.
+- Non-uniform: each dimension uses its own min/max.
+
+When dimensions have very different value ranges, non-uniform SQ usually gives 
better reconstruction quality.
+
+### Key Characteristics
+
+- Strengths:
+  - Straightforward and stable.
+  - Predictable compression (`sq8` roughly 4x vs float32 values, `sq4` roughly 
8x).
+- Weaknesses:
+  - Assumes distribution can be bucketed with fixed steps.
+  - If a dimension is highly non-uniform (for example, strong long-tail), 
quantization error can increase.
+
+### Faiss Source-Level Note (SQ)
+
+Under the Doris + optimized Faiss implementation path, SQ training computes 
min/max statistics first, then expands the range slightly to reduce 
out-of-range risk at add time. A simplified shape is:
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+    trained.resize(2);
+    float& vmin = trained[0];
+    float& vmax = trained[1];
+    // scan all values to get min/max
+    // then optionally expand range by rs_arg
+}
+```
+
+For non-uniform SQ, Faiss computes statistics per dimension (instead of one 
global range), which is why it typically behaves better when different 
dimensions have very different value scales.
+
+### Practical Observations
+
+In the internal 128D/256D HNSW tests:
+- `sq8` generally preserved recall better than `sq4`.
+- SQ index build/add time was significantly higher than FLAT.
+- Search latency change was often small for `sq8`, while `sq4` had larger 
recall drop.
+
+The following bar charts are based on example benchmark data:
+
+![SQ build time vs rows 
(128D)](/images/vector-search/quantization-survey/sq-build-time-vs-rows.png)
+
+![SQ memory usage vs rows 
(128D)](/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png)
+
+## Product Quantization (PQ)
+
+### Principle
+
+PQ splits a `D`-dim vector into `M` subvectors (`D/M` dimensions each), then 
applies k-means codebooks to each subspace.
+
+Main parameters:
+- `pq_m`: number of subquantizers (subvectors)
+- `pq_nbits`: bits per subvector code
+
+Larger `pq_m` usually improves quality but increases training/encoding cost.
+
+### Why PQ Can Be Faster at Query Time
+
+PQ can use LUT (look-up table) distance estimation:
+- Precompute distances between query subvectors and codebook centroids.
+- Approximate full-vector distance by table lookups + accumulation.
+
+This avoids full reconstruction and can reduce search CPU cost.
+
+### Faiss Source-Level Note (PQ)
+
+Under the same implementation path, Faiss `ProductQuantizer` trains codebooks 
over subspaces and stores them in a contiguous centroid table. A simplified 
shape is:
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+    Clustering clus(dsub, ksub, cp);
+    IndexFlatL2 index(dsub);
+    clus.train(n * M, x, index);
+    for (int m = 0; m < M; m++) {
+        set_params(clus.centroids.data(), m);
+    }
+}
+```
+
+Centroids are laid out as `(M, ksub, dsub)`, where:
+- `M`: number of subquantizers,
+- `ksub`: codebook size per subspace (`2^pq_nbits`),
+- `dsub`: subvector dimension (`D / M`).
+
+### Practical Observations
+
+In the same internal tests:
+- PQ showed clear compression benefits.
+- PQ encoding/training overhead was high.
+- Compared with SQ, PQ often had better search-time behavior due to LUT 
acceleration, but recall/build trade-offs depended on data and parameters.
+
+The following bar charts are based on example benchmark data:
+
+![PQ index size on disk vs rows 
(128D/256D)](/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png)
+
+![PQ build time vs rows 
(128D/256D)](/images/vector-search/quantization-survey/pq-build-time-vs-rows.png)
+
+![PQ search time vs rows 
(128D/256D)](/images/vector-search/quantization-survey/pq-search-time-vs-rows.png)
+
+## Practical Selection Guide for Doris
+
+Use this as a starting point:
+
+1. Memory is sufficient and recall is top priority: `flat`.
+2. Need low risk compression with relatively stable quality: `sq8`.
+3. Extreme memory pressure and can accept lower recall: `sq4`.
+4. Need stronger memory-performance balance and can spend time tuning: `pq`.
+
+Recommended validation process:
+
+1. Start with `flat` as baseline.
+2. Test `sq8` first; compare recall and P95/P99 latency.
+3. If memory is still too high, test `pq` (`pq_m = D/2` as first trial).
+4. Use `sq4` only when memory reduction has higher priority than recall.
+
+## Benchmarking Notes
+
+- Absolute times are hardware/thread/dataset dependent.
+- Compare methods under the same:
+  - vector dimension,
+  - index parameters,
+  - segment size,
+  - query set and ground truth.
+- Evaluate both quality and cost:
+  - Recall@K,
+  - index size,
+  - build time,
+  - query latency.
+
+## Related Documents
+
+- [Overview](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN Resource Estimation Guide](./resource-estimation.md)
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..a0a4bddb35f
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md
@@ -0,0 +1,213 @@
+---
+{
+    "title": "向量量化算法调研与选型",
+    "sidebar_label": "量化算法调研",
+    "language": "zh-CN",
+    "description": "面向 Doris ANN 的向量量化调研总结，覆盖 SQ、PQ 与选型建议。"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+本文从科普与工程实践的角度介绍常见向量量化算法，并结合 Apache Doris 的 ANN 使用场景给出选型建议。
+
+## 为什么需要向量量化
+
+在 ANN 场景下（尤其是 HNSW），索引常常受内存约束。量化的核心是把 float32 
等高精度向量编码成低精度表示，在可接受的召回损失下换取更低内存占用。
+
+在 Doris 中，ANN 索引通过 `quantizer` 控制量化方式：
+- `flat`：不量化（质量最高，内存最高）
+- `sq8`：8bit 标量量化
+- `sq4`：4bit 标量量化
+- `pq`：乘积量化
+
+示例（HNSW + quantizer）：
+
+```sql
+CREATE TABLE vector_tbl (
+  id BIGINT,
+  embedding ARRAY<FLOAT>,
+  INDEX ann_idx (embedding) USING ANN PROPERTIES (
+    "index_type" = "hnsw",
+    "metric_type" = "l2_distance",
+    "dim" = "768",
+    "quantizer" = "sq8"
+  )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## 算法概览
+
+| 方法 | 核心思想 | 典型收益 | 主要代价 |
+|---|---|---|---|
+| SQ（标量量化） | 每个维度独立量化 | 内存显著下降，实现简单 | 构建开销高于 FLAT；压缩越强召回越容易下降 |
+| PQ（乘积量化） | 切分子向量并分组量化 | 常见场景下压缩与查询速度更平衡 | 训练/编码成本高，参数需要调优 |
+
+Apache Doris 当前以优化过的 Faiss 作为 ANN 向量索引与检索的核心实现，因此下面关于 SQ/PQ 的机制说明可以直接映射到 Doris 
的实际行为。
+
+## 标量量化（SQ）
+
+### 原理
+
+SQ 不改变向量维度，只降低每维数值精度。
+
+常见的 min-max 量化映射：
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss 中 SQ 主要有两种：
+- Uniform：所有维度共享一组 min/max。
+- Non-uniform：每个维度单独统计 min/max。
+
+当不同维度的数据范围差异很大时，Non-uniform 通常重建误差更小。
+
+### 特点
+
+- 优点：
+  - 实现直接，行为稳定。
+  - 压缩比可预期（相对 float32 值，`sq8` 约 4x，`sq4` 约 8x）。
+- 局限：
+  - 本质仍是固定步长分桶。
+  - 若单维分布明显非均匀（例如长尾分布），误差会上升。
+
+### Faiss 源码要点（SQ）
+
+在 Doris 使用的优化版 Faiss 实现路径中，SQ 训练会先统计最小值/最大值，再按需要对范围做轻微扩展，降低后续 add 
阶段越界风险。简化后形态如下：
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+    trained.resize(2);
+    float& vmin = trained[0];
+    float& vmax = trained[1];
+    // 扫描样本得到 min/max
+    // 再根据 rs_arg 做范围扩展
+}
+```
+
+对于 non-uniform SQ，Faiss 会按维度分别统计（而不是全局一组范围），因此在“各维度数值尺度差异明显”的数据上通常效果更好。
+
+### 实践观察
+
+在内部 128D/256D 的 HNSW 测试中：
+- `sq8` 的召回通常明显好于 `sq4`。
+- SQ 的构建/编码时间显著高于 FLAT。
+- `sq8` 查询延迟变化通常不大，`sq4` 的召回下滑更明显。
+
+以下柱状图基于示例 benchmark 数据绘制：
+
+![SQ 构建耗时 vs 
行数（128D）](/images/vector-search/quantization-survey/sq-build-time-vs-rows.png)
+
+![SQ 内存占用 vs 
行数（128D）](/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png)
+
+## 乘积量化（PQ）
+
+### 原理
+
+PQ 将 `D` 维向量切分成 `M` 个子向量（每个子向量 `D/M` 维），在每个子空间做 k-means 量化。
+
+关键参数：
+- `pq_m`：子量化器个数
+- `pq_nbits`：每个子向量编码位数
+
+通常 `pq_m` 越大，精度越好，但训练和编码代价越高。
+
+### 为什么 PQ 查询可能更快
+
+PQ 可使用 LUT（查找表）做距离近似：
+- 预先计算查询子向量到各子空间质心的距离。
+- 查询时通过查表并累加估算整体距离。
+
+这可以避免完整重建，在很多场景下降低搜索阶段 CPU 开销。
+
+### Faiss 源码要点（PQ）
+
+在同一实现路径下，Faiss 的 `ProductQuantizer` 会在子空间上训练码本，并把质心存储在连续内存中。简化后形态如下：
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+    Clustering clus(dsub, ksub, cp);
+    IndexFlatL2 index(dsub);
+    clus.train(n * M, x, index);
+    for (int m = 0; m < M; m++) {
+        set_params(clus.centroids.data(), m);
+    }
+}
+```
+
+其质心布局可理解为 `(M, ksub, dsub)`：
+- `M`：子量化器个数；
+- `ksub`：每个子空间的码本大小（`2^pq_nbits`）；
+- `dsub`：子向量维度（`D / M`）。
+
+### 实践观察
+
+在相同内部测试中：
+- PQ 对压缩的正向收益明显。
+- PQ 的训练/编码开销较高。
+- 相比 SQ，PQ 往往能借助 LUT 在查询阶段获得更好的速度表现，但召回与构建成本仍依赖数据分布与参数组合。
+
+以下柱状图基于示例 benchmark 数据绘制：
+
+![PQ 磁盘索引大小 vs 
行数（128D/256D）](/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png)
+
+![PQ 构建耗时 vs 
行数（128D/256D）](/images/vector-search/quantization-survey/pq-build-time-vs-rows.png)
+
+![PQ 查询耗时 vs 
行数（128D/256D）](/images/vector-search/quantization-survey/pq-search-time-vs-rows.png)
+
+## Doris 选型建议
+
+可按以下顺序落地：
+
+1. 内存充足且召回优先：`flat`。
+2. 希望低风险降内存且质量更稳：`sq8`。
+3. 内存压力极大且可接受更低召回：`sq4`。
+4. 追求压缩与性能平衡并接受调参：`pq`。
+
+建议的验证流程：
+
+1. 先以 `flat` 建基线。
+2. 优先测试 `sq8`，对比 Recall 与 P95/P99 延迟。
+3. 内存仍不够时测试 `pq`（可先从 `pq_m = D/2` 起步）。
+4. 仅在“内存优先于召回”时考虑 `sq4`。
+
+## 压测注意事项
+
+- 绝对耗时与硬件、线程数、数据集强相关。
+- 横向对比时应固定：
+  - 向量维度，
+  - 索引参数，
+  - segment 规模，
+  - 查询集与真值集。
+- 评估指标建议同时覆盖：
+  - Recall@K，
+  - 索引体积，
+  - 构建耗时，
+  - 查询延迟。
+
+## 相关文档
+
+- [向量搜索概述](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN 资源评估指南](./resource-estimation.md)
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..a0a4bddb35f
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md
@@ -0,0 +1,213 @@
+---
+{
+    "title": "向量量化算法调研与选型",
+    "sidebar_label": "量化算法调研",
+    "language": "zh-CN",
+    "description": "面向 Doris ANN 的向量量化调研总结，覆盖 SQ、PQ 与选型建议。"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+本文从科普与工程实践的角度介绍常见向量量化算法，并结合 Apache Doris 的 ANN 使用场景给出选型建议。
+
+## 为什么需要向量量化
+
+在 ANN 场景下（尤其是 HNSW），索引常常受内存约束。量化的核心是把 float32 
等高精度向量编码成低精度表示，在可接受的召回损失下换取更低内存占用。
+
+在 Doris 中，ANN 索引通过 `quantizer` 控制量化方式：
+- `flat`：不量化（质量最高，内存最高）
+- `sq8`：8bit 标量量化
+- `sq4`：4bit 标量量化
+- `pq`：乘积量化
+
+示例（HNSW + quantizer）：
+
+```sql
+CREATE TABLE vector_tbl (
+  id BIGINT,
+  embedding ARRAY<FLOAT>,
+  INDEX ann_idx (embedding) USING ANN PROPERTIES (
+    "index_type" = "hnsw",
+    "metric_type" = "l2_distance",
+    "dim" = "768",
+    "quantizer" = "sq8"
+  )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## 算法概览
+
+| 方法 | 核心思想 | 典型收益 | 主要代价 |
+|---|---|---|---|
+| SQ（标量量化） | 每个维度独立量化 | 内存显著下降，实现简单 | 构建开销高于 FLAT；压缩越强召回越容易下降 |
+| PQ（乘积量化） | 切分子向量并分组量化 | 常见场景下压缩与查询速度更平衡 | 训练/编码成本高，参数需要调优 |
+
+Apache Doris 当前以优化过的 Faiss 作为 ANN 向量索引与检索的核心实现，因此下面关于 SQ/PQ 的机制说明可以直接映射到 Doris 
的实际行为。
+
+## 标量量化（SQ）
+
+### 原理
+
+SQ 不改变向量维度，只降低每维数值精度。
+
+常见的 min-max 量化映射：
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss 中 SQ 主要有两种：
+- Uniform：所有维度共享一组 min/max。
+- Non-uniform：每个维度单独统计 min/max。
+
+当不同维度的数据范围差异很大时，Non-uniform 通常重建误差更小。
+
+### 特点
+
+- 优点：
+  - 实现直接，行为稳定。
+  - 压缩比可预期（相对 float32 值，`sq8` 约 4x，`sq4` 约 8x）。
+- 局限：
+  - 本质仍是固定步长分桶。
+  - 若单维分布明显非均匀（例如长尾分布），误差会上升。
+
+### Faiss 源码要点（SQ）
+
+在 Doris 使用的优化版 Faiss 实现路径中，SQ 训练会先统计最小值/最大值，再按需要对范围做轻微扩展，降低后续 add 
阶段越界风险。简化后形态如下：
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+    trained.resize(2);
+    float& vmin = trained[0];
+    float& vmax = trained[1];
+    // 扫描样本得到 min/max
+    // 再根据 rs_arg 做范围扩展
+}
+```
+
+对于 non-uniform SQ，Faiss 会按维度分别统计（而不是全局一组范围），因此在“各维度数值尺度差异明显”的数据上通常效果更好。
+
+### 实践观察
+
+在内部 128D/256D 的 HNSW 测试中：
+- `sq8` 的召回通常明显好于 `sq4`。
+- SQ 的构建/编码时间显著高于 FLAT。
+- `sq8` 查询延迟变化通常不大，`sq4` 的召回下滑更明显。
+
+以下柱状图基于示例 benchmark 数据绘制：
+
+![SQ 构建耗时 vs 
行数（128D）](/images/vector-search/quantization-survey/sq-build-time-vs-rows.png)
+
+![SQ 内存占用 vs 
行数（128D）](/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png)
+
+## 乘积量化（PQ）
+
+### 原理
+
+PQ 将 `D` 维向量切分成 `M` 个子向量（每个子向量 `D/M` 维），在每个子空间做 k-means 量化。
+
+关键参数：
+- `pq_m`：子量化器个数
+- `pq_nbits`：每个子向量编码位数
+
+通常 `pq_m` 越大，精度越好，但训练和编码代价越高。
+
+### 为什么 PQ 查询可能更快
+
+PQ 可使用 LUT（查找表）做距离近似：
+- 预先计算查询子向量到各子空间质心的距离。
+- 查询时通过查表并累加估算整体距离。
+
+这可以避免完整重建，在很多场景下降低搜索阶段 CPU 开销。
+
+### Faiss 源码要点（PQ）
+
+在同一实现路径下，Faiss 的 `ProductQuantizer` 会在子空间上训练码本，并把质心存储在连续内存中。简化后形态如下：
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+    Clustering clus(dsub, ksub, cp);
+    IndexFlatL2 index(dsub);
+    clus.train(n * M, x, index);
+    for (int m = 0; m < M; m++) {
+        set_params(clus.centroids.data(), m);
+    }
+}
+```
+
+其质心布局可理解为 `(M, ksub, dsub)`：
+- `M`：子量化器个数；
+- `ksub`：每个子空间的码本大小（`2^pq_nbits`）；
+- `dsub`：子向量维度（`D / M`）。
+
+### 实践观察
+
+在相同内部测试中：
+- PQ 对压缩的正向收益明显。
+- PQ 的训练/编码开销较高。
+- 相比 SQ，PQ 往往能借助 LUT 在查询阶段获得更好的速度表现，但召回与构建成本仍依赖数据分布与参数组合。
+
+以下柱状图基于示例 benchmark 数据绘制：
+
+![PQ 磁盘索引大小 vs 
行数（128D/256D）](/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png)
+
+![PQ 构建耗时 vs 
行数（128D/256D）](/images/vector-search/quantization-survey/pq-build-time-vs-rows.png)
+
+![PQ 查询耗时 vs 
行数（128D/256D）](/images/vector-search/quantization-survey/pq-search-time-vs-rows.png)
+
+## Doris 选型建议
+
+可按以下顺序落地：
+
+1. 内存充足且召回优先：`flat`。
+2. 希望低风险降内存且质量更稳：`sq8`。
+3. 内存压力极大且可接受更低召回：`sq4`。
+4. 追求压缩与性能平衡并接受调参：`pq`。
+
+建议的验证流程：
+
+1. 先以 `flat` 建基线。
+2. 优先测试 `sq8`，对比 Recall 与 P95/P99 延迟。
+3. 内存仍不够时测试 `pq`（可先从 `pq_m = D/2` 起步）。
+4. 仅在“内存优先于召回”时考虑 `sq4`。
+
+## 压测注意事项
+
+- 绝对耗时与硬件、线程数、数据集强相关。
+- 横向对比时应固定：
+  - 向量维度，
+  - 索引参数，
+  - segment 规模，
+  - 查询集与真值集。
+- 评估指标建议同时覆盖：
+  - Recall@K，
+  - 索引体积，
+  - 构建耗时，
+  - 查询延迟。
+
+## 相关文档
+
+- [向量搜索概述](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN 资源评估指南](./resource-estimation.md)
diff --git a/sidebars.ts b/sidebars.ts
index 6a55db0c664..aa065f6a01d 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -333,6 +333,7 @@ const sidebars: SidebarsConfig = {
                                 'ai/vector-search/ivf',
                                 'ai/vector-search/index-management',
                                 'ai/vector-search/resource-estimation',
+                                'ai/vector-search/quantization-survey',
                                 'ai/vector-search/performance',
                                 'ai/vector-search/performance-large-scale',
                                 'ai/vector-search/behind-index',
diff --git 
a/static/images/vector-search/quantization-survey/pq-build-time-vs-rows.png 
b/static/images/vector-search/quantization-survey/pq-build-time-vs-rows.png
new file mode 100644
index 00000000000..645159ec8ed
Binary files /dev/null and 
b/static/images/vector-search/quantization-survey/pq-build-time-vs-rows.png 
differ
diff --git 
a/static/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png
 
b/static/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png
new file mode 100644
index 00000000000..3ad76515f45
Binary files /dev/null and 
b/static/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png
 differ
diff --git 
a/static/images/vector-search/quantization-survey/pq-search-time-vs-rows.png 
b/static/images/vector-search/quantization-survey/pq-search-time-vs-rows.png
new file mode 100644
index 00000000000..68f9c04ec01
Binary files /dev/null and 
b/static/images/vector-search/quantization-survey/pq-search-time-vs-rows.png 
differ
diff --git 
a/static/images/vector-search/quantization-survey/sq-build-time-vs-rows.png 
b/static/images/vector-search/quantization-survey/sq-build-time-vs-rows.png
new file mode 100644
index 00000000000..2493c88c408
Binary files /dev/null and 
b/static/images/vector-search/quantization-survey/sq-build-time-vs-rows.png 
differ
diff --git 
a/static/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png 
b/static/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png
new file mode 100644
index 00000000000..6925c75e7a0
Binary files /dev/null and 
b/static/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png 
differ
diff --git a/versioned_docs/version-4.x/ai/vector-search/quantization-survey.md 
b/versioned_docs/version-4.x/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..6a2d032120b
--- /dev/null
+++ b/versioned_docs/version-4.x/ai/vector-search/quantization-survey.md
@@ -0,0 +1,212 @@
+---
+{
+    "title": "Vector Quantization Survey and Selection Guide",
+    "language": "en",
+    "description": "A practical survey of SQ, PQ, and related quantization 
methods for Doris ANN, with trade-offs and selection guidance."
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document introduces common vector quantization methods from a practical 
perspective, and explains how to apply them in Apache Doris ANN workloads.
+
+## Why Quantization Is Needed
+
+For ANN workloads, especially HNSW, index memory can quickly become the 
bottleneck. Quantization maps high-precision vectors (usually float32) to 
lower-precision codes, trading a small amount of recall for lower memory usage.
+
+In Doris, quantization is controlled by the `quantizer` property in ANN 
indexes:
+- `flat`: no quantization (highest quality, highest memory)
+- `sq8`: scalar quantization, 8-bit
+- `sq4`: scalar quantization, 4-bit
+- `pq`: product quantization
+
+Example (HNSW + quantizer):
+
+```sql
+CREATE TABLE vector_tbl (
+  id BIGINT,
+  embedding ARRAY<FLOAT>,
+  INDEX ann_idx (embedding) USING ANN PROPERTIES (
+    "index_type" = "hnsw",
+    "metric_type" = "l2_distance",
+    "dim" = "768",
+    "quantizer" = "sq8"
+  )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## Method Overview
+
+| Method | Core Idea | Typical Gain | Main Cost |
+|---|---|---|---|
+| SQ (Scalar Quantization) | Quantize each dimension independently | Large 
memory reduction, simple implementation | Build slower than FLAT; recall drops 
with stronger compression |
+| PQ (Product Quantization) | Split vector into subvectors, quantize each 
subvector with codebooks | Better compression/latency balance on many datasets 
| Training/encoding cost is high; tuning is required |
+
+Apache Doris currently uses an optimized Faiss implementation as the core 
engine for ANN vector indexing and search. The SQ/PQ behavior discussed below 
is therefore directly relevant to Doris in practice.
+
+## Scalar Quantization (SQ)
+
+### Principle
+
+SQ keeps the vector dimension unchanged and only lowers per-dimension 
precision.
+
+A standard min-max mapping per dimension is:
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss SQ has two styles:
+- Uniform: all dimensions share one min/max range.
+- Non-uniform: each dimension uses its own min/max.
+
+When dimensions have very different value ranges, non-uniform SQ usually gives 
better reconstruction quality.
+
+### Key Characteristics
+
+- Strengths:
+  - Straightforward and stable.
+  - Predictable compression (`sq8` roughly 4x vs float32 values, `sq4` roughly 
8x).
+- Weaknesses:
+  - Assumes distribution can be bucketed with fixed steps.
+  - If a dimension is highly non-uniform (for example, strong long-tail), 
quantization error can increase.
+
+### Faiss Source-Level Note (SQ)
+
+Under the Doris + optimized Faiss implementation path, SQ training computes 
min/max statistics first, then expands the range slightly to reduce 
out-of-range risk at add time. A simplified shape is:
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+    trained.resize(2);
+    float& vmin = trained[0];
+    float& vmax = trained[1];
+    // scan all values to get min/max
+    // then optionally expand range by rs_arg
+}
+```
+
+For non-uniform SQ, Faiss computes statistics per dimension (instead of one 
global range), which is why it typically behaves better when different 
dimensions have very different value scales.
+
+### Practical Observations
+
+In the internal 128D/256D HNSW tests:
+- `sq8` generally preserved recall better than `sq4`.
+- SQ index build/add time was significantly higher than FLAT.
+- Search latency change was often small for `sq8`, while `sq4` had larger 
recall drop.
+
+The following bar charts are based on example benchmark data:
+
+![SQ build time vs rows 
(128D)](/images/vector-search/quantization-survey/sq-build-time-vs-rows.png)
+
+![SQ memory usage vs rows 
(128D)](/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png)
+
+## Product Quantization (PQ)
+
+### Principle
+
+PQ splits a `D`-dim vector into `M` subvectors (`D/M` dimensions each), then 
applies k-means codebooks to each subspace.
+
+Main parameters:
+- `pq_m`: number of subquantizers (subvectors)
+- `pq_nbits`: bits per subvector code
+
+Larger `pq_m` usually improves quality but increases training/encoding cost.
+
+### Why PQ Can Be Faster at Query Time
+
+PQ can use LUT (look-up table) distance estimation:
+- Precompute distances between query subvectors and codebook centroids.
+- Approximate full-vector distance by table lookups + accumulation.
+
+This avoids full reconstruction and can reduce search CPU cost.
+
+### Faiss Source-Level Note (PQ)
+
+Under the same implementation path, Faiss `ProductQuantizer` trains codebooks 
over subspaces and stores them in a contiguous centroid table. A simplified 
shape is:
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+    Clustering clus(dsub, ksub, cp);
+    IndexFlatL2 index(dsub);
+    clus.train(n * M, x, index);
+    for (int m = 0; m < M; m++) {
+        set_params(clus.centroids.data(), m);
+    }
+}
+```
+
+Centroids are laid out as `(M, ksub, dsub)`, where:
+- `M`: number of subquantizers,
+- `ksub`: codebook size per subspace (`2^pq_nbits`),
+- `dsub`: subvector dimension (`D / M`).
+
+### Practical Observations
+
+In the same internal tests:
+- PQ showed clear compression benefits.
+- PQ encoding/training overhead was high.
+- Compared with SQ, PQ often had better search-time behavior due to LUT 
acceleration, but recall/build trade-offs depended on data and parameters.
+
+The following bar charts are based on example benchmark data:
+
+![PQ index size on disk vs rows 
(128D/256D)](/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png)
+
+![PQ build time vs rows 
(128D/256D)](/images/vector-search/quantization-survey/pq-build-time-vs-rows.png)
+
+![PQ search time vs rows 
(128D/256D)](/images/vector-search/quantization-survey/pq-search-time-vs-rows.png)
+
+## Practical Selection Guide for Doris
+
+Use this as a starting point:
+
+1. Memory is sufficient and recall is top priority: `flat`.
+2. Need low risk compression with relatively stable quality: `sq8`.
+3. Extreme memory pressure and can accept lower recall: `sq4`.
+4. Need stronger memory-performance balance and can spend time tuning: `pq`.
+
+Recommended validation process:
+
+1. Start with `flat` as baseline.
+2. Test `sq8` first; compare recall and P95/P99 latency.
+3. If memory is still too high, test `pq` (`pq_m = D/2` as first trial).
+4. Use `sq4` only when memory reduction has higher priority than recall.
+
+## Benchmarking Notes
+
+- Absolute times are hardware/thread/dataset dependent.
+- Compare methods under the same:
+  - vector dimension,
+  - index parameters,
+  - segment size,
+  - query set and ground truth.
+- Evaluate both quality and cost:
+  - Recall@K,
+  - index size,
+  - build time,
+  - query latency.
+
+## Related Documents
+
+- [Overview](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN Resource Estimation Guide](./resource-estimation.md)
diff --git a/versioned_sidebars/version-4.x-sidebars.json 
b/versioned_sidebars/version-4.x-sidebars.json
index 769ffbfe76f..a61946af0f0 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -333,6 +333,7 @@
                                 "ai/vector-search/ivf",
                                 "ai/vector-search/index-management",
                                 "ai/vector-search/resource-estimation",
+                                "ai/vector-search/quantization-survey",
                                 "ai/vector-search/performance",
                                 "ai/vector-search/performance-large-scale",
                                 "ai/vector-search/behind-index"


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [doc] add vector quantization guide for current and 4.x (#3461)

Reply via email to