This is an automated email from the ASF dual-hosted git repository.
twice pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/kvrocks-website.git
The following commit(s) were added to refs/heads/main by this push:
new b476325 Update KVrocks search index encoding documentation (#241)
b476325 is described below
commit b476325ee40786c8625591d9020ac8e5e28229e5
Author: Rebecca Zhou <[email protected]>
AuthorDate: Thu Aug 1 18:03:39 2024 -0700
Update KVrocks search index encoding documentation (#241)
* Update KVrocks search index encoding documentation
* Update community/kvrocks-search-index-encoding.md
* Update community/kvrocks-search-index-encoding.md
---------
Co-authored-by: Twice <[email protected]>
---
community/kvrocks-search-index-encoding.md | 50 ++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/community/kvrocks-search-index-encoding.md
b/community/kvrocks-search-index-encoding.md
index 0e5faba..c115a5f 100644
--- a/community/kvrocks-search-index-encoding.md
+++ b/community/kvrocks-search-index-encoding.md
@@ -33,6 +33,7 @@ The common encoding format of key is as follows:
|-------------|-------------|
| tag | 1 |
| numeric | 2 |
+| vector | 3 |
The common encoding format of a *field flag* is:
@@ -92,6 +93,28 @@ where *separator* currently can only be an ASCII character,
and case sensitive c
| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte |
```
+### HNSW Vector Field Metadata
+
+This metadata format is specifically designed to support efficient vector
search using the HNSW (Hierarchical Navigable Small World) algorithm. The
encoding captures various parameters and settings relevant for managing the
vector index properties and optimizing vector search operations.
+
+```
+| namespace | FIELD_META | index name | field name | | field flag |
vector type | dimension | distance metric | initial cap | m | ef
construction | ef runtime | epsilon | number of levels |
+|-----------|------------|------------|------------| ->
|------------|-------------|-----------|-----------------|-------------|-----------|-----------------|------------|---------|------------------|
+| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte | 1
byte | 2 bytes | 1 byte | 4 bytes | 2 bytes | 4 bytes
| 4 bytes | 8 bytes | 2 bytes |
+```
+#### Required attributes
+- **vector type**: Specifies the type of vectors stored (e.g., `FLOAT32`,
`FLOAT64`); Now Kvrocks only supports `FLOAT64`.
+- **dimension**: The dimensionality of the vectors (number of elements in each
vector).
+- **distance metric**: Metric used for distance calculation between vectors
(i.e. `L2`, `IP`, `COSINE`).
+
+#### Optional attributes
+- **initial cap**: Initial capacity of the HNSW graph, indicating the initial
number of elements; Default is 500000.
+- **m**: Maximum number of edges per node in the HNSW graph; Default is 16.
+- **ef construction**: Size of the dynamic candidate list during the index
construction phase; Default is 200.
+- **ef runtime**: Size of the dynamic candidate list during the search phase;
Default is 10.
+- **epsilon**: Epsilon value for approximate search, controlling the trade-off
between search precision and speed; Default is 0.01.
+- **number of levels**: Number of levels in the HNSW graph, affecting the
hierarchical structure of the graph.
+
## Index data encoding
Index data refers to the information stored after indexing the real data,
@@ -112,3 +135,30 @@ which is used to quickly get corresponding data in
subsequent query processes.
|-----------|---------|------------|------------|-----------------|------------|
-> |------------|
| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 8 bytes | 4+B bytes
| -> | 0 byte |
```
+
+### HNSW Vector field
+
+#### HNSW graph entry types
+
+| hnsw type | enum value |
+|--------------|-------------|
+| NODE | 1 |
+| EDGE | 2 |
+
+#### HNSW node index encoding
+
+```
+| namespace | FIELD | index name | field name | level | hnsw type |
user key | | num of neighbours | vector dimension | vector data
|
+|-----------|---------|------------|------------|-----------|----------------|------------|
-> |-------------------|------------------|-----------------------|
+| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 2 bytes | NODE (1 byte) |
4+B bytes | -> | 2 bytes | 2 bytes | dimension * 8 bytes
|
+```
+
+#### HNSW edge index encoding
+
+```
+| namespace | FIELD | index name | field name | level | hnsw type |
user key 1 | user key 2 | | null |
+|-----------|---------|------------|------------|-----------|----------------|------------|------------|
-> |------------|
+| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 2 bytes | EDGE (1 byte) |
4+B bytes | 4+B bytes | -> | 0 byte |
+```
+
+where *user key 1* and *user key 2* represent the endpoints of an edge at a
specific level within the HNSW graph.