This is an automated email from the ASF dual-hosted git repository.
yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 2cbc43783f9 [codex] fix vector-search ANN index_type docs consistency
(#3467)
2cbc43783f9 is described below
commit 2cbc43783f9ddc2a8742ce52bf74087a95627ae0
Author: zhiqiang <[email protected]>
AuthorDate: Mon Mar 16 19:13:08 2026 +0800
[codex] fix vector-search ANN index_type docs consistency (#3467)
## Summary
This PR fixes multiple documentation inconsistencies in the
vector-search docs around ANN `index_type` capability and parameter
guidance.
## Problem
The overview pages previously stated that `index_type` only supported
`hnsw`, while Doris already supports both `hnsw` and `ivf`. This could
mislead users when choosing index algorithms.
There were also related clarity issues:
- `overview` parameter tables did not include IVF's key parameter
`nlist`.
- Chinese overview pages used `metric` in one bullet while table/schema
use `metric_type`.
- Two English page titles had a typo: `Apaceh`.
## Changes
- Updated `index_type` support wording to `hnsw` + `ivf` in:
- `docs/ai/vector-search/overview.md`
- `versioned_docs/version-4.x/ai/vector-search/overview.md`
-
`i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/overview.md`
-
`i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/overview.md`
- Added `nlist` row (IVF-specific, default `1024`) to the same overview
parameter tables.
- Fixed Chinese bullet `metric` -> `metric_type` in current/4.x overview
pages.
- Fixed title typo `Apaceh` -> `Apache` in:
- `docs/ai/vector-search/ivf.md`
- `docs/ai/vector-search/hnsw.md`
## User Impact
Users now get consistent and accurate ANN capability docs, clearer IVF
parameter guidance from overview pages, and aligned EN/ZH terminology.
## Validation
- Verified diffs are limited to the six targeted documentation files.
- Re-scanned vector-search docs to confirm no remaining "only HNSW"
wording in overview capability descriptions.
---
docs/ai/vector-search/hnsw.md | 2 +-
docs/ai/vector-search/ivf.md | 3 +--
docs/ai/vector-search/overview.md | 5 +++--
.../current/ai/vector-search/overview.md | 7 ++++---
.../version-4.x/ai/vector-search/overview.md | 7 ++++---
versioned_docs/version-4.x/ai/vector-search/overview.md | 5 +++--
6 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/docs/ai/vector-search/hnsw.md b/docs/ai/vector-search/hnsw.md
index f2952ca9e9c..a89efe8c867 100644
--- a/docs/ai/vector-search/hnsw.md
+++ b/docs/ai/vector-search/hnsw.md
@@ -24,7 +24,7 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
-# HNSW and How to use it in Apaceh Doris
+# HNSW and How to use it in Apache Doris
HNSW (Malkov & Yashunin, 2016) has become the de facto standard for
high‑performance online vector search thanks to its ability to achieve high
recall and low latency with relatively modest resource consumption. Since
Apache Doris 4.x, an ANN index based on HNSW has been supported. This document
walks through the HNSW algorithm, key parameters, and engineering practices,
and explains how to build and tune HNSW‑based ANN indexes in production Doris
clusters.
diff --git a/docs/ai/vector-search/ivf.md b/docs/ai/vector-search/ivf.md
index 93b8491c8c7..123102c123b 100644
--- a/docs/ai/vector-search/ivf.md
+++ b/docs/ai/vector-search/ivf.md
@@ -23,7 +23,7 @@ specific language governing permissions and limitations
under the License.
-->
-# IVF and How to use it in Apaceh Doris
+# IVF and How to use it in Apache Doris
IVF index is an efficient data structure used for Approximate Nearest Neighbor
(ANN) search. It helps narrow down the scope of vectors during search,
significantly improving search speed. Since Apache Doris 4.x, an ANN index
based on IVF has been supported. This document walks through the IVF algorithm,
key parameters, and engineering practices, and explains how to build and tune
IVF‑based ANN indexes in production Doris clusters.
@@ -364,4 +364,3 @@ NUM_PER_BATCH=1000000 python3 -m vectordbbench doris --host
127.0.0.1 --port 903
# search
NUM_PER_BATCH=1000000 python3 -m vectordbbench doris --host 127.0.0.1 --port
9030 --case-type Performance768D1M --db-name Performance768D1M
--search-concurrent --search-serial --num-concurrency 10,40,80
--stream-load-rows-per-batch 500000 --index-prop index_type=ivf,nlist=1024
--session-var ivf_nprobe=64 --skip-load --skip-drop-old
```
-
diff --git a/docs/ai/vector-search/overview.md
b/docs/ai/vector-search/overview.md
index b2b7c95e7cb..55cd7a47fb6 100644
--- a/docs/ai/vector-search/overview.md
+++ b/docs/ai/vector-search/overview.md
@@ -58,16 +58,17 @@ PROPERTIES (
);
```
-- index_type: `hnsw` means using the [Hierarchical Navigable Small World
algorithm](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world)
+- index_type: `hnsw` (for [Hierarchical Navigable Small
World](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world)) or
`ivf` (for inverted file)
- metric_type: `l2_distance` means using L2 distance as the distance function
- dim: `128` means the vector dimension is 128
- quantizer: `flat` means each vector dimension is stored as original float32
| Parameter | Required | Supported/Options | Default | Description |
|-----------|----------|-------------------|---------|-------------|
-| `index_type` | Yes | hnsw only | (none) | ANN index algorithm. Currently
only HNSW supported. |
+| `index_type` | Yes | `hnsw`, `ivf` | (none) | ANN index algorithm. Currently
supports HNSW and IVF. |
| `metric_type` | Yes | `l2_distance`, `inner_product` | (none) | Vector
similarity/distance metric. L2 = Euclidean; inner_product can approximate
cosine if vectors are normalized. |
| `dim` | Yes | Positive integer (> 0) | (none) | Vector dimension. All
imported vectors must match or an error is raised. |
+| `nlist` | No | Positive integer | `1024` | IVF inverted-list count.
Effective when `index_type=ivf`; larger values may improve recall/speed
trade-offs but increase build overhead. |
| `max_degree` | No | Positive integer | `32` | HNSW M (max neighbors per
node). Affects index memory and search performance. |
| `ef_construction` | No | Positive integer | `40` | HNSW efConstruction
(candidate queue size during build). Larger gives better quality but slower
build. |
| `quantizer` | No | `flat`, `sq8`, `sq4`, `pq` | `flat` | Vector
encoding/quantization: `flat` = raw; `sq8`/`sq4` = scalar quantization (8/4
bit), `pq` = product quantization to reduce memory. |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/overview.md
index 207a42303b4..4871c8e0f4d 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search/overview.md
@@ -49,17 +49,18 @@ PROPERTIES (
"replication_num" = "1"
);
```
-- index_type: hnsw 表示使用 [Hierarchical Navigable Small World
算法](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world)
-- metric: l2_distance 表示使用 L2 距离作为距离函数
+- index_type: 可选 `hnsw`([Hierarchical Navigable Small World
算法](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world))或
`ivf`(倒排文件索引)
+- metric_type: l2_distance 表示使用 L2 距离作为距离函数
- dim: 128 表示向量维度为 128
- quantizer: flat 表示按原始 float32 存储各维度
| 参数 | 是否必填 | 支持/可选值 | 默认值 | 说明 |
|------|----------|-------------|--------|------|
-| `index_type` | 是 | 仅支持:hnsw | (无) | 指定所使用的 ANN 索引算法。当前只支持 HNSW。 |
+| `index_type` | 是 | 支持:`hnsw`、`ivf` | (无) | 指定所使用的 ANN 索引算法。当前支持 HNSW 和 IVF。 |
| `metric_type` | 是 | `l2_distance`,`inner_product` | (无) | 指定向量相似度/距离度量方式。L2
为欧氏距离,inner_product 可用于余弦相似时需先归一化向量。 |
| `dim` | 是 | 正整数 (> 0) | (无) | 指定向量维度,后续导入的所有向量的维度必须与此一致,否则报错。 |
+| `nlist` | 否 | 正整数 | `1024` | IVF 的倒排桶数量。在 `index_type=ivf`
时生效;取值越大通常有助于召回率/速度权衡,但会增加构建开销。 |
| `max_degree` | 否 | 正整数 | `32` | HNSW 图中单个节点的最大邻居数(M),影响索引内存与搜索性能。 |
| `ef_construction` | 否 | 正整数 | `40` | HNSW
构建阶段的候选队列大小(efConstruction),越大构图质量越好但构建更慢。 |
| `quantizer` | 否 | `flat`,`sq8`,`sq4`, `pq` | `flat` | 指定向量编码/量化方式:`flat`
为原始存储,`sq8`/`sq4` 为标量量化(8/4 bit), `pq` 为乘积量化。 |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/overview.md
index 207a42303b4..4871c8e0f4d 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/vector-search/overview.md
@@ -49,17 +49,18 @@ PROPERTIES (
"replication_num" = "1"
);
```
-- index_type: hnsw 表示使用 [Hierarchical Navigable Small World
算法](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world)
-- metric: l2_distance 表示使用 L2 距离作为距离函数
+- index_type: 可选 `hnsw`([Hierarchical Navigable Small World
算法](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world))或
`ivf`(倒排文件索引)
+- metric_type: l2_distance 表示使用 L2 距离作为距离函数
- dim: 128 表示向量维度为 128
- quantizer: flat 表示按原始 float32 存储各维度
| 参数 | 是否必填 | 支持/可选值 | 默认值 | 说明 |
|------|----------|-------------|--------|------|
-| `index_type` | 是 | 仅支持:hnsw | (无) | 指定所使用的 ANN 索引算法。当前只支持 HNSW。 |
+| `index_type` | 是 | 支持:`hnsw`、`ivf` | (无) | 指定所使用的 ANN 索引算法。当前支持 HNSW 和 IVF。 |
| `metric_type` | 是 | `l2_distance`,`inner_product` | (无) | 指定向量相似度/距离度量方式。L2
为欧氏距离,inner_product 可用于余弦相似时需先归一化向量。 |
| `dim` | 是 | 正整数 (> 0) | (无) | 指定向量维度,后续导入的所有向量的维度必须与此一致,否则报错。 |
+| `nlist` | 否 | 正整数 | `1024` | IVF 的倒排桶数量。在 `index_type=ivf`
时生效;取值越大通常有助于召回率/速度权衡,但会增加构建开销。 |
| `max_degree` | 否 | 正整数 | `32` | HNSW 图中单个节点的最大邻居数(M),影响索引内存与搜索性能。 |
| `ef_construction` | 否 | 正整数 | `40` | HNSW
构建阶段的候选队列大小(efConstruction),越大构图质量越好但构建更慢。 |
| `quantizer` | 否 | `flat`,`sq8`,`sq4`, `pq` | `flat` | 指定向量编码/量化方式:`flat`
为原始存储,`sq8`/`sq4` 为标量量化(8/4 bit), `pq` 为乘积量化。 |
diff --git a/versioned_docs/version-4.x/ai/vector-search/overview.md
b/versioned_docs/version-4.x/ai/vector-search/overview.md
index b2b7c95e7cb..55cd7a47fb6 100644
--- a/versioned_docs/version-4.x/ai/vector-search/overview.md
+++ b/versioned_docs/version-4.x/ai/vector-search/overview.md
@@ -58,16 +58,17 @@ PROPERTIES (
);
```
-- index_type: `hnsw` means using the [Hierarchical Navigable Small World
algorithm](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world)
+- index_type: `hnsw` (for [Hierarchical Navigable Small
World](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world)) or
`ivf` (for inverted file)
- metric_type: `l2_distance` means using L2 distance as the distance function
- dim: `128` means the vector dimension is 128
- quantizer: `flat` means each vector dimension is stored as original float32
| Parameter | Required | Supported/Options | Default | Description |
|-----------|----------|-------------------|---------|-------------|
-| `index_type` | Yes | hnsw only | (none) | ANN index algorithm. Currently
only HNSW supported. |
+| `index_type` | Yes | `hnsw`, `ivf` | (none) | ANN index algorithm. Currently
supports HNSW and IVF. |
| `metric_type` | Yes | `l2_distance`, `inner_product` | (none) | Vector
similarity/distance metric. L2 = Euclidean; inner_product can approximate
cosine if vectors are normalized. |
| `dim` | Yes | Positive integer (> 0) | (none) | Vector dimension. All
imported vectors must match or an error is raised. |
+| `nlist` | No | Positive integer | `1024` | IVF inverted-list count.
Effective when `index_type=ivf`; larger values may improve recall/speed
trade-offs but increase build overhead. |
| `max_degree` | No | Positive integer | `32` | HNSW M (max neighbors per
node). Affects index memory and search performance. |
| `ef_construction` | No | Positive integer | `40` | HNSW efConstruction
(candidate queue size during build). Larger gives better quality but slower
build. |
| `quantizer` | No | `flat`, `sq8`, `sq4`, `pq` | `flat` | Vector
encoding/quantization: `flat` = raw; `sq8`/`sq4` = scalar quantization (8/4
bit), `pq` = product quantization to reduce memory. |
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]