This is an automated email from the ASF dual-hosted git repository.
yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new bd609ea43dc [doc] add vector quantization guide for current and 4.x
(#3461)
bd609ea43dc is described below
commit bd609ea43dc8a304c4965287029328bc5b781e5f
Author: zhiqiang <[email protected]>
AuthorDate: Fri Mar 13 13:53:56 2026 +0800
[doc] add vector quantization guide for current and 4.x (#3461)
## Summary
This PR adds a new vector quantization guide for Doris ANN and syncs it
to both `current` and `4.x` docs.
### Added docs
- `docs/ai/vector-search/quantization-survey.md`
- `versioned_docs/version-4.x/ai/vector-search/quantization-survey.md`
-
`i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md`
-
`i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md`
### Sidebar updates
- `sidebars.ts`
- `versioned_sidebars/version-4.x-sidebars.json`
### Assets
- `static/images/vector-search/quantization-survey/*.png`
- SQ: build time vs rows, memory usage vs rows
- PQ: index size on disk vs rows, build time vs rows, search time vs
rows
## Notes
- Removed RaBitQ content because Doris does not currently support it.
- Kept the doc in an educational style with practical Doris guidance.
- Preserved a concise Faiss source-level section with proper Doris/Faiss
background context.
---
docs/ai/vector-search/quantization-survey.md | 212 ++++++++++++++++++++
.../ai/vector-search/quantization-survey.md | 213 +++++++++++++++++++++
.../ai/vector-search/quantization-survey.md | 213 +++++++++++++++++++++
sidebars.ts | 1 +
.../quantization-survey/pq-build-time-vs-rows.png | Bin 0 -> 58592 bytes
.../pq-index-size-on-disk-vs-rows.png | Bin 0 -> 67005 bytes
.../quantization-survey/pq-search-time-vs-rows.png | Bin 0 -> 74165 bytes
.../quantization-survey/sq-build-time-vs-rows.png | Bin 0 -> 42287 bytes
.../sq-memory-usage-vs-rows.png | Bin 0 -> 44927 bytes
.../ai/vector-search/quantization-survey.md | 212 ++++++++++++++++++++
versioned_sidebars/version-4.x-sidebars.json | 1 +
11 files changed, 852 insertions(+)
diff --git a/docs/ai/vector-search/quantization-survey.md
b/docs/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..6a2d032120b
--- /dev/null
+++ b/docs/ai/vector-search/quantization-survey.md
@@ -0,0 +1,212 @@
+---
+{
+ "title": "Vector Quantization Survey and Selection Guide",
+ "language": "en",
+ "description": "A practical survey of SQ, PQ, and related quantization
methods for Doris ANN, with trade-offs and selection guidance."
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document introduces common vector quantization methods from a practical
perspective, and explains how to apply them in Apache Doris ANN workloads.
+
+## Why Quantization Is Needed
+
+For ANN workloads, especially HNSW, index memory can quickly become the
bottleneck. Quantization maps high-precision vectors (usually float32) to
lower-precision codes, trading a small amount of recall for lower memory usage.
+
+In Doris, quantization is controlled by the `quantizer` property in ANN
indexes:
+- `flat`: no quantization (highest quality, highest memory)
+- `sq8`: scalar quantization, 8-bit
+- `sq4`: scalar quantization, 4-bit
+- `pq`: product quantization
+
+Example (HNSW + quantizer):
+
+```sql
+CREATE TABLE vector_tbl (
+ id BIGINT,
+ embedding ARRAY<FLOAT>,
+ INDEX ann_idx (embedding) USING ANN PROPERTIES (
+ "index_type" = "hnsw",
+ "metric_type" = "l2_distance",
+ "dim" = "768",
+ "quantizer" = "sq8"
+ )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## Method Overview
+
+| Method | Core Idea | Typical Gain | Main Cost |
+|---|---|---|---|
+| SQ (Scalar Quantization) | Quantize each dimension independently | Large
memory reduction, simple implementation | Build slower than FLAT; recall drops
with stronger compression |
+| PQ (Product Quantization) | Split vector into subvectors, quantize each
subvector with codebooks | Better compression/latency balance on many datasets
| Training/encoding cost is high; tuning is required |
+
+Apache Doris currently uses an optimized Faiss implementation as the core
engine for ANN vector indexing and search. The SQ/PQ behavior discussed below
is therefore directly relevant to Doris in practice.
+
+## Scalar Quantization (SQ)
+
+### Principle
+
+SQ keeps the vector dimension unchanged and only lowers per-dimension
precision.
+
+A standard min-max mapping per dimension is:
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss SQ has two styles:
+- Uniform: all dimensions share one min/max range.
+- Non-uniform: each dimension uses its own min/max.
+
+When dimensions have very different value ranges, non-uniform SQ usually gives
better reconstruction quality.
+
+### Key Characteristics
+
+- Strengths:
+ - Straightforward and stable.
+ - Predictable compression (`sq8` roughly 4x vs float32 values, `sq4` roughly
8x).
+- Weaknesses:
+ - Assumes distribution can be bucketed with fixed steps.
+ - If a dimension is highly non-uniform (for example, strong long-tail),
quantization error can increase.
+
+### Faiss Source-Level Note (SQ)
+
+Under the Doris + optimized Faiss implementation path, SQ training computes
min/max statistics first, then expands the range slightly to reduce
out-of-range risk at add time. A simplified shape is:
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+ trained.resize(2);
+ float& vmin = trained[0];
+ float& vmax = trained[1];
+ // scan all values to get min/max
+ // then optionally expand range by rs_arg
+}
+```
+
+For non-uniform SQ, Faiss computes statistics per dimension (instead of one
global range), which is why it typically behaves better when different
dimensions have very different value scales.
+
+### Practical Observations
+
+In the internal 128D/256D HNSW tests:
+- `sq8` generally preserved recall better than `sq4`.
+- SQ index build/add time was significantly higher than FLAT.
+- Search latency change was often small for `sq8`, while `sq4` had larger
recall drop.
+
+The following bar charts are based on example benchmark data:
+
+
+
+
+
+## Product Quantization (PQ)
+
+### Principle
+
+PQ splits a `D`-dim vector into `M` subvectors (`D/M` dimensions each), then
applies k-means codebooks to each subspace.
+
+Main parameters:
+- `pq_m`: number of subquantizers (subvectors)
+- `pq_nbits`: bits per subvector code
+
+Larger `pq_m` usually improves quality but increases training/encoding cost.
+
+### Why PQ Can Be Faster at Query Time
+
+PQ can use LUT (look-up table) distance estimation:
+- Precompute distances between query subvectors and codebook centroids.
+- Approximate full-vector distance by table lookups + accumulation.
+
+This avoids full reconstruction and can reduce search CPU cost.
+
+### Faiss Source-Level Note (PQ)
+
+Under the same implementation path, Faiss `ProductQuantizer` trains codebooks
over subspaces and stores them in a contiguous centroid table. A simplified
shape is:
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+ Clustering clus(dsub, ksub, cp);
+ IndexFlatL2 index(dsub);
+ clus.train(n * M, x, index);
+ for (int m = 0; m < M; m++) {
+ set_params(clus.centroids.data(), m);
+ }
+}
+```
+
+Centroids are laid out as `(M, ksub, dsub)`, where:
+- `M`: number of subquantizers,
+- `ksub`: codebook size per subspace (`2^pq_nbits`),
+- `dsub`: subvector dimension (`D / M`).
+
+### Practical Observations
+
+In the same internal tests:
+- PQ showed clear compression benefits.
+- PQ encoding/training overhead was high.
+- Compared with SQ, PQ often had better search-time behavior due to LUT
acceleration, but recall/build trade-offs depended on data and parameters.
+
+The following bar charts are based on example benchmark data:
+
+
+
+
+
+
+
+## Practical Selection Guide for Doris
+
+Use this as a starting point:
+
+1. Memory is sufficient and recall is top priority: `flat`.
+2. Need low risk compression with relatively stable quality: `sq8`.
+3. Extreme memory pressure and can accept lower recall: `sq4`.
+4. Need stronger memory-performance balance and can spend time tuning: `pq`.
+
+Recommended validation process:
+
+1. Start with `flat` as baseline.
+2. Test `sq8` first; compare recall and P95/P99 latency.
+3. If memory is still too high, test `pq` (`pq_m = D/2` as first trial).
+4. Use `sq4` only when memory reduction has higher priority than recall.
+
+## Benchmarking Notes
+
+- Absolute times are hardware/thread/dataset dependent.
+- Compare methods under the same:
+ - vector dimension,
+ - index parameters,
+ - segment size,
+ - query set and ground truth.
+- Evaluate both quality and cost:
+ - Recall@K,
+ - index size,
+ - build time,
+ - query latency.
+
+## Related Documents
+
+- [Overview](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN Resource Estimation Guide](./resource-estimation.md)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..a0a4bddb35f
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/quantization-survey.md
@@ -0,0 +1,213 @@
+---
+{
+ "title": "向量量化算法调研与选型",
+ "sidebar_label": "量化算法调研",
+ "language": "zh-CN",
+ "description": "面向 Doris ANN 的向量量化调研总结,覆盖 SQ、PQ 与选型建议。"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+本文从科普与工程实践的角度介绍常见向量量化算法,并结合 Apache Doris 的 ANN 使用场景给出选型建议。
+
+## 为什么需要向量量化
+
+在 ANN 场景下(尤其是 HNSW),索引常常受内存约束。量化的核心是把 float32
等高精度向量编码成低精度表示,在可接受的召回损失下换取更低内存占用。
+
+在 Doris 中,ANN 索引通过 `quantizer` 控制量化方式:
+- `flat`:不量化(质量最高,内存最高)
+- `sq8`:8bit 标量量化
+- `sq4`:4bit 标量量化
+- `pq`:乘积量化
+
+示例(HNSW + quantizer):
+
+```sql
+CREATE TABLE vector_tbl (
+ id BIGINT,
+ embedding ARRAY<FLOAT>,
+ INDEX ann_idx (embedding) USING ANN PROPERTIES (
+ "index_type" = "hnsw",
+ "metric_type" = "l2_distance",
+ "dim" = "768",
+ "quantizer" = "sq8"
+ )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## 算法概览
+
+| 方法 | 核心思想 | 典型收益 | 主要代价 |
+|---|---|---|---|
+| SQ(标量量化) | 每个维度独立量化 | 内存显著下降,实现简单 | 构建开销高于 FLAT;压缩越强召回越容易下降 |
+| PQ(乘积量化) | 切分子向量并分组量化 | 常见场景下压缩与查询速度更平衡 | 训练/编码成本高,参数需要调优 |
+
+Apache Doris 当前以优化过的 Faiss 作为 ANN 向量索引与检索的核心实现,因此下面关于 SQ/PQ 的机制说明可以直接映射到 Doris
的实际行为。
+
+## 标量量化(SQ)
+
+### 原理
+
+SQ 不改变向量维度,只降低每维数值精度。
+
+常见的 min-max 量化映射:
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss 中 SQ 主要有两种:
+- Uniform:所有维度共享一组 min/max。
+- Non-uniform:每个维度单独统计 min/max。
+
+当不同维度的数据范围差异很大时,Non-uniform 通常重建误差更小。
+
+### 特点
+
+- 优点:
+ - 实现直接,行为稳定。
+ - 压缩比可预期(相对 float32 值,`sq8` 约 4x,`sq4` 约 8x)。
+- 局限:
+ - 本质仍是固定步长分桶。
+ - 若单维分布明显非均匀(例如长尾分布),误差会上升。
+
+### Faiss 源码要点(SQ)
+
+在 Doris 使用的优化版 Faiss 实现路径中,SQ 训练会先统计最小值/最大值,再按需要对范围做轻微扩展,降低后续 add
阶段越界风险。简化后形态如下:
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+ trained.resize(2);
+ float& vmin = trained[0];
+ float& vmax = trained[1];
+ // 扫描样本得到 min/max
+ // 再根据 rs_arg 做范围扩展
+}
+```
+
+对于 non-uniform SQ,Faiss 会按维度分别统计(而不是全局一组范围),因此在“各维度数值尺度差异明显”的数据上通常效果更好。
+
+### 实践观察
+
+在内部 128D/256D 的 HNSW 测试中:
+- `sq8` 的召回通常明显好于 `sq4`。
+- SQ 的构建/编码时间显著高于 FLAT。
+- `sq8` 查询延迟变化通常不大,`sq4` 的召回下滑更明显。
+
+以下柱状图基于示例 benchmark 数据绘制:
+
+
+
+
+
+## 乘积量化(PQ)
+
+### 原理
+
+PQ 将 `D` 维向量切分成 `M` 个子向量(每个子向量 `D/M` 维),在每个子空间做 k-means 量化。
+
+关键参数:
+- `pq_m`:子量化器个数
+- `pq_nbits`:每个子向量编码位数
+
+通常 `pq_m` 越大,精度越好,但训练和编码代价越高。
+
+### 为什么 PQ 查询可能更快
+
+PQ 可使用 LUT(查找表)做距离近似:
+- 预先计算查询子向量到各子空间质心的距离。
+- 查询时通过查表并累加估算整体距离。
+
+这可以避免完整重建,在很多场景下降低搜索阶段 CPU 开销。
+
+### Faiss 源码要点(PQ)
+
+在同一实现路径下,Faiss 的 `ProductQuantizer` 会在子空间上训练码本,并把质心存储在连续内存中。简化后形态如下:
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+ Clustering clus(dsub, ksub, cp);
+ IndexFlatL2 index(dsub);
+ clus.train(n * M, x, index);
+ for (int m = 0; m < M; m++) {
+ set_params(clus.centroids.data(), m);
+ }
+}
+```
+
+其质心布局可理解为 `(M, ksub, dsub)`:
+- `M`:子量化器个数;
+- `ksub`:每个子空间的码本大小(`2^pq_nbits`);
+- `dsub`:子向量维度(`D / M`)。
+
+### 实践观察
+
+在相同内部测试中:
+- PQ 对压缩的正向收益明显。
+- PQ 的训练/编码开销较高。
+- 相比 SQ,PQ 往往能借助 LUT 在查询阶段获得更好的速度表现,但召回与构建成本仍依赖数据分布与参数组合。
+
+以下柱状图基于示例 benchmark 数据绘制:
+
+
+
+
+
+
+
+## Doris 选型建议
+
+可按以下顺序落地:
+
+1. 内存充足且召回优先:`flat`。
+2. 希望低风险降内存且质量更稳:`sq8`。
+3. 内存压力极大且可接受更低召回:`sq4`。
+4. 追求压缩与性能平衡并接受调参:`pq`。
+
+建议的验证流程:
+
+1. 先以 `flat` 建基线。
+2. 优先测试 `sq8`,对比 Recall 与 P95/P99 延迟。
+3. 内存仍不够时测试 `pq`(可先从 `pq_m = D/2` 起步)。
+4. 仅在“内存优先于召回”时考虑 `sq4`。
+
+## 压测注意事项
+
+- 绝对耗时与硬件、线程数、数据集强相关。
+- 横向对比时应固定:
+ - 向量维度,
+ - 索引参数,
+ - segment 规模,
+ - 查询集与真值集。
+- 评估指标建议同时覆盖:
+ - Recall@K,
+ - 索引体积,
+ - 构建耗时,
+ - 查询延迟。
+
+## 相关文档
+
+- [向量搜索概述](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN 资源评估指南](./resource-estimation.md)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..a0a4bddb35f
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/quantization-survey.md
@@ -0,0 +1,213 @@
+---
+{
+ "title": "向量量化算法调研与选型",
+ "sidebar_label": "量化算法调研",
+ "language": "zh-CN",
+ "description": "面向 Doris ANN 的向量量化调研总结,覆盖 SQ、PQ 与选型建议。"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+本文从科普与工程实践的角度介绍常见向量量化算法,并结合 Apache Doris 的 ANN 使用场景给出选型建议。
+
+## 为什么需要向量量化
+
+在 ANN 场景下(尤其是 HNSW),索引常常受内存约束。量化的核心是把 float32
等高精度向量编码成低精度表示,在可接受的召回损失下换取更低内存占用。
+
+在 Doris 中,ANN 索引通过 `quantizer` 控制量化方式:
+- `flat`:不量化(质量最高,内存最高)
+- `sq8`:8bit 标量量化
+- `sq4`:4bit 标量量化
+- `pq`:乘积量化
+
+示例(HNSW + quantizer):
+
+```sql
+CREATE TABLE vector_tbl (
+ id BIGINT,
+ embedding ARRAY<FLOAT>,
+ INDEX ann_idx (embedding) USING ANN PROPERTIES (
+ "index_type" = "hnsw",
+ "metric_type" = "l2_distance",
+ "dim" = "768",
+ "quantizer" = "sq8"
+ )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## 算法概览
+
+| 方法 | 核心思想 | 典型收益 | 主要代价 |
+|---|---|---|---|
+| SQ(标量量化) | 每个维度独立量化 | 内存显著下降,实现简单 | 构建开销高于 FLAT;压缩越强召回越容易下降 |
+| PQ(乘积量化) | 切分子向量并分组量化 | 常见场景下压缩与查询速度更平衡 | 训练/编码成本高,参数需要调优 |
+
+Apache Doris 当前以优化过的 Faiss 作为 ANN 向量索引与检索的核心实现,因此下面关于 SQ/PQ 的机制说明可以直接映射到 Doris
的实际行为。
+
+## 标量量化(SQ)
+
+### 原理
+
+SQ 不改变向量维度,只降低每维数值精度。
+
+常见的 min-max 量化映射:
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss 中 SQ 主要有两种:
+- Uniform:所有维度共享一组 min/max。
+- Non-uniform:每个维度单独统计 min/max。
+
+当不同维度的数据范围差异很大时,Non-uniform 通常重建误差更小。
+
+### 特点
+
+- 优点:
+ - 实现直接,行为稳定。
+ - 压缩比可预期(相对 float32 值,`sq8` 约 4x,`sq4` 约 8x)。
+- 局限:
+ - 本质仍是固定步长分桶。
+ - 若单维分布明显非均匀(例如长尾分布),误差会上升。
+
+### Faiss 源码要点(SQ)
+
+在 Doris 使用的优化版 Faiss 实现路径中,SQ 训练会先统计最小值/最大值,再按需要对范围做轻微扩展,降低后续 add
阶段越界风险。简化后形态如下:
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+ trained.resize(2);
+ float& vmin = trained[0];
+ float& vmax = trained[1];
+ // 扫描样本得到 min/max
+ // 再根据 rs_arg 做范围扩展
+}
+```
+
+对于 non-uniform SQ,Faiss 会按维度分别统计(而不是全局一组范围),因此在“各维度数值尺度差异明显”的数据上通常效果更好。
+
+### 实践观察
+
+在内部 128D/256D 的 HNSW 测试中:
+- `sq8` 的召回通常明显好于 `sq4`。
+- SQ 的构建/编码时间显著高于 FLAT。
+- `sq8` 查询延迟变化通常不大,`sq4` 的召回下滑更明显。
+
+以下柱状图基于示例 benchmark 数据绘制:
+
+
+
+
+
+## 乘积量化(PQ)
+
+### 原理
+
+PQ 将 `D` 维向量切分成 `M` 个子向量(每个子向量 `D/M` 维),在每个子空间做 k-means 量化。
+
+关键参数:
+- `pq_m`:子量化器个数
+- `pq_nbits`:每个子向量编码位数
+
+通常 `pq_m` 越大,精度越好,但训练和编码代价越高。
+
+### 为什么 PQ 查询可能更快
+
+PQ 可使用 LUT(查找表)做距离近似:
+- 预先计算查询子向量到各子空间质心的距离。
+- 查询时通过查表并累加估算整体距离。
+
+这可以避免完整重建,在很多场景下降低搜索阶段 CPU 开销。
+
+### Faiss 源码要点(PQ)
+
+在同一实现路径下,Faiss 的 `ProductQuantizer` 会在子空间上训练码本,并把质心存储在连续内存中。简化后形态如下:
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+ Clustering clus(dsub, ksub, cp);
+ IndexFlatL2 index(dsub);
+ clus.train(n * M, x, index);
+ for (int m = 0; m < M; m++) {
+ set_params(clus.centroids.data(), m);
+ }
+}
+```
+
+其质心布局可理解为 `(M, ksub, dsub)`:
+- `M`:子量化器个数;
+- `ksub`:每个子空间的码本大小(`2^pq_nbits`);
+- `dsub`:子向量维度(`D / M`)。
+
+### 实践观察
+
+在相同内部测试中:
+- PQ 对压缩的正向收益明显。
+- PQ 的训练/编码开销较高。
+- 相比 SQ,PQ 往往能借助 LUT 在查询阶段获得更好的速度表现,但召回与构建成本仍依赖数据分布与参数组合。
+
+以下柱状图基于示例 benchmark 数据绘制:
+
+
+
+
+
+
+
+## Doris 选型建议
+
+可按以下顺序落地:
+
+1. 内存充足且召回优先:`flat`。
+2. 希望低风险降内存且质量更稳:`sq8`。
+3. 内存压力极大且可接受更低召回:`sq4`。
+4. 追求压缩与性能平衡并接受调参:`pq`。
+
+建议的验证流程:
+
+1. 先以 `flat` 建基线。
+2. 优先测试 `sq8`,对比 Recall 与 P95/P99 延迟。
+3. 内存仍不够时测试 `pq`(可先从 `pq_m = D/2` 起步)。
+4. 仅在“内存优先于召回”时考虑 `sq4`。
+
+## 压测注意事项
+
+- 绝对耗时与硬件、线程数、数据集强相关。
+- 横向对比时应固定:
+ - 向量维度,
+ - 索引参数,
+ - segment 规模,
+ - 查询集与真值集。
+- 评估指标建议同时覆盖:
+ - Recall@K,
+ - 索引体积,
+ - 构建耗时,
+ - 查询延迟。
+
+## 相关文档
+
+- [向量搜索概述](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN 资源评估指南](./resource-estimation.md)
diff --git a/sidebars.ts b/sidebars.ts
index 6a55db0c664..aa065f6a01d 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -333,6 +333,7 @@ const sidebars: SidebarsConfig = {
'ai/vector-search/ivf',
'ai/vector-search/index-management',
'ai/vector-search/resource-estimation',
+ 'ai/vector-search/quantization-survey',
'ai/vector-search/performance',
'ai/vector-search/performance-large-scale',
'ai/vector-search/behind-index',
diff --git
a/static/images/vector-search/quantization-survey/pq-build-time-vs-rows.png
b/static/images/vector-search/quantization-survey/pq-build-time-vs-rows.png
new file mode 100644
index 00000000000..645159ec8ed
Binary files /dev/null and
b/static/images/vector-search/quantization-survey/pq-build-time-vs-rows.png
differ
diff --git
a/static/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png
b/static/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png
new file mode 100644
index 00000000000..3ad76515f45
Binary files /dev/null and
b/static/images/vector-search/quantization-survey/pq-index-size-on-disk-vs-rows.png
differ
diff --git
a/static/images/vector-search/quantization-survey/pq-search-time-vs-rows.png
b/static/images/vector-search/quantization-survey/pq-search-time-vs-rows.png
new file mode 100644
index 00000000000..68f9c04ec01
Binary files /dev/null and
b/static/images/vector-search/quantization-survey/pq-search-time-vs-rows.png
differ
diff --git
a/static/images/vector-search/quantization-survey/sq-build-time-vs-rows.png
b/static/images/vector-search/quantization-survey/sq-build-time-vs-rows.png
new file mode 100644
index 00000000000..2493c88c408
Binary files /dev/null and
b/static/images/vector-search/quantization-survey/sq-build-time-vs-rows.png
differ
diff --git
a/static/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png
b/static/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png
new file mode 100644
index 00000000000..6925c75e7a0
Binary files /dev/null and
b/static/images/vector-search/quantization-survey/sq-memory-usage-vs-rows.png
differ
diff --git a/versioned_docs/version-4.x/ai/vector-search/quantization-survey.md
b/versioned_docs/version-4.x/ai/vector-search/quantization-survey.md
new file mode 100644
index 00000000000..6a2d032120b
--- /dev/null
+++ b/versioned_docs/version-4.x/ai/vector-search/quantization-survey.md
@@ -0,0 +1,212 @@
+---
+{
+ "title": "Vector Quantization Survey and Selection Guide",
+ "language": "en",
+ "description": "A practical survey of SQ, PQ, and related quantization
methods for Doris ANN, with trade-offs and selection guidance."
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document introduces common vector quantization methods from a practical
perspective, and explains how to apply them in Apache Doris ANN workloads.
+
+## Why Quantization Is Needed
+
+For ANN workloads, especially HNSW, index memory can quickly become the
bottleneck. Quantization maps high-precision vectors (usually float32) to
lower-precision codes, trading a small amount of recall for lower memory usage.
+
+In Doris, quantization is controlled by the `quantizer` property in ANN
indexes:
+- `flat`: no quantization (highest quality, highest memory)
+- `sq8`: scalar quantization, 8-bit
+- `sq4`: scalar quantization, 4-bit
+- `pq`: product quantization
+
+Example (HNSW + quantizer):
+
+```sql
+CREATE TABLE vector_tbl (
+ id BIGINT,
+ embedding ARRAY<FLOAT>,
+ INDEX ann_idx (embedding) USING ANN PROPERTIES (
+ "index_type" = "hnsw",
+ "metric_type" = "l2_distance",
+ "dim" = "768",
+ "quantizer" = "sq8"
+ )
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 8
+PROPERTIES ("replication_num" = "3");
+```
+
+## Method Overview
+
+| Method | Core Idea | Typical Gain | Main Cost |
+|---|---|---|---|
+| SQ (Scalar Quantization) | Quantize each dimension independently | Large
memory reduction, simple implementation | Build slower than FLAT; recall drops
with stronger compression |
+| PQ (Product Quantization) | Split vector into subvectors, quantize each
subvector with codebooks | Better compression/latency balance on many datasets
| Training/encoding cost is high; tuning is required |
+
+Apache Doris currently uses an optimized Faiss implementation as the core
engine for ANN vector indexing and search. The SQ/PQ behavior discussed below
is therefore directly relevant to Doris in practice.
+
+## Scalar Quantization (SQ)
+
+### Principle
+
+SQ keeps the vector dimension unchanged and only lowers per-dimension
precision.
+
+A standard min-max mapping per dimension is:
+- `max_code = (1 << b) - 1`
+- `scale = (max_val - min_val) / max_code`
+- `code = round((x - min_val) / scale)`
+
+Faiss SQ has two styles:
+- Uniform: all dimensions share one min/max range.
+- Non-uniform: each dimension uses its own min/max.
+
+When dimensions have very different value ranges, non-uniform SQ usually gives
better reconstruction quality.
+
+### Key Characteristics
+
+- Strengths:
+ - Straightforward and stable.
+ - Predictable compression (`sq8` roughly 4x vs float32 values, `sq4` roughly
8x).
+- Weaknesses:
+ - Assumes distribution can be bucketed with fixed steps.
+ - If a dimension is highly non-uniform (for example, strong long-tail),
quantization error can increase.
+
+### Faiss Source-Level Note (SQ)
+
+Under the Doris + optimized Faiss implementation path, SQ training computes
min/max statistics first, then expands the range slightly to reduce
out-of-range risk at add time. A simplified shape is:
+
+```cpp
+void train_Uniform(..., const float* x, std::vector<float>& trained) {
+ trained.resize(2);
+ float& vmin = trained[0];
+ float& vmax = trained[1];
+ // scan all values to get min/max
+ // then optionally expand range by rs_arg
+}
+```
+
+For non-uniform SQ, Faiss computes statistics per dimension (instead of one
global range), which is why it typically behaves better when different
dimensions have very different value scales.
+
+### Practical Observations
+
+In the internal 128D/256D HNSW tests:
+- `sq8` generally preserved recall better than `sq4`.
+- SQ index build/add time was significantly higher than FLAT.
+- Search latency change was often small for `sq8`, while `sq4` had larger
recall drop.
+
+The following bar charts are based on example benchmark data:
+
+
+
+
+
+## Product Quantization (PQ)
+
+### Principle
+
+PQ splits a `D`-dim vector into `M` subvectors (`D/M` dimensions each), then
applies k-means codebooks to each subspace.
+
+Main parameters:
+- `pq_m`: number of subquantizers (subvectors)
+- `pq_nbits`: bits per subvector code
+
+Larger `pq_m` usually improves quality but increases training/encoding cost.
+
+### Why PQ Can Be Faster at Query Time
+
+PQ can use LUT (look-up table) distance estimation:
+- Precompute distances between query subvectors and codebook centroids.
+- Approximate full-vector distance by table lookups + accumulation.
+
+This avoids full reconstruction and can reduce search CPU cost.
+
+### Faiss Source-Level Note (PQ)
+
+Under the same implementation path, Faiss `ProductQuantizer` trains codebooks
over subspaces and stores them in a contiguous centroid table. A simplified
shape is:
+
+```cpp
+void ProductQuantizer::train(size_t n, const float* x) {
+ Clustering clus(dsub, ksub, cp);
+ IndexFlatL2 index(dsub);
+ clus.train(n * M, x, index);
+ for (int m = 0; m < M; m++) {
+ set_params(clus.centroids.data(), m);
+ }
+}
+```
+
+Centroids are laid out as `(M, ksub, dsub)`, where:
+- `M`: number of subquantizers,
+- `ksub`: codebook size per subspace (`2^pq_nbits`),
+- `dsub`: subvector dimension (`D / M`).
+
+### Practical Observations
+
+In the same internal tests:
+- PQ showed clear compression benefits.
+- PQ encoding/training overhead was high.
+- Compared with SQ, PQ often had better search-time behavior due to LUT
acceleration, but recall/build trade-offs depended on data and parameters.
+
+The following bar charts are based on example benchmark data:
+
+
+
+
+
+
+
+## Practical Selection Guide for Doris
+
+Use this as a starting point:
+
+1. Memory is sufficient and recall is top priority: `flat`.
+2. Need low risk compression with relatively stable quality: `sq8`.
+3. Extreme memory pressure and can accept lower recall: `sq4`.
+4. Need stronger memory-performance balance and can spend time tuning: `pq`.
+
+Recommended validation process:
+
+1. Start with `flat` as baseline.
+2. Test `sq8` first; compare recall and P95/P99 latency.
+3. If memory is still too high, test `pq` (`pq_m = D/2` as first trial).
+4. Use `sq4` only when memory reduction has higher priority than recall.
+
+## Benchmarking Notes
+
+- Absolute times are hardware/thread/dataset dependent.
+- Compare methods under the same:
+ - vector dimension,
+ - index parameters,
+ - segment size,
+ - query set and ground truth.
+- Evaluate both quality and cost:
+ - Recall@K,
+ - index size,
+ - build time,
+ - query latency.
+
+## Related Documents
+
+- [Overview](./overview.md)
+- [HNSW](./hnsw.md)
+- [IVF](./ivf.md)
+- [ANN Resource Estimation Guide](./resource-estimation.md)
diff --git a/versioned_sidebars/version-4.x-sidebars.json
b/versioned_sidebars/version-4.x-sidebars.json
index 769ffbfe76f..a61946af0f0 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -333,6 +333,7 @@
"ai/vector-search/ivf",
"ai/vector-search/index-management",
"ai/vector-search/resource-estimation",
+ "ai/vector-search/quantization-survey",
"ai/vector-search/performance",
"ai/vector-search/performance-large-scale",
"ai/vector-search/behind-index"
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]