This is an automated email from the ASF dual-hosted git repository.

twice pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/kvrocks-website.git


The following commit(s) were added to refs/heads/main by this push:
     new b476325  Update KVrocks search index encoding documentation (#241)
b476325 is described below

commit b476325ee40786c8625591d9020ac8e5e28229e5
Author: Rebecca Zhou <[email protected]>
AuthorDate: Thu Aug 1 18:03:39 2024 -0700

    Update KVrocks search index encoding documentation (#241)
    
    * Update KVrocks search index encoding documentation
    
    * Update community/kvrocks-search-index-encoding.md
    
    * Update community/kvrocks-search-index-encoding.md
    
    ---------
    
    Co-authored-by: Twice <[email protected]>
---
 community/kvrocks-search-index-encoding.md | 50 ++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/community/kvrocks-search-index-encoding.md 
b/community/kvrocks-search-index-encoding.md
index 0e5faba..c115a5f 100644
--- a/community/kvrocks-search-index-encoding.md
+++ b/community/kvrocks-search-index-encoding.md
@@ -33,6 +33,7 @@ The common encoding format of key is as follows:
 |-------------|-------------|
 | tag         | 1           |
 | numeric     | 2           |
+| vector      | 3           |
 
 The common encoding format of a *field flag* is:
 
@@ -92,6 +93,28 @@ where *separator* currently can only be an ASCII character, 
and case sensitive c
 | 1+X bytes |  1 byte    | 4+Y bytes  | 4+Z bytes  |  ->  |   1 byte   |
 ```
 
+### HNSW Vector Field Metadata
+
+This metadata format is specifically designed to support efficient vector 
search using the HNSW (Hierarchical Navigable Small World) algorithm. The 
encoding captures various parameters and settings relevant for managing the 
vector index properties and optimizing vector search operations.
+
+```
+| namespace | FIELD_META | index name | field name |      | field flag | 
vector type | dimension | distance metric | initial cap |     m     | ef 
construction | ef runtime | epsilon | number of levels |
+|-----------|------------|------------|------------|  ->  
|------------|-------------|-----------|-----------------|-------------|-----------|-----------------|------------|---------|------------------|
+| 1+X bytes |  1 byte    | 4+Y bytes  | 4+Z bytes  |  ->  |   1 byte   |   1 
byte    |  2 bytes  |     1 byte      |   4 bytes   |  2 bytes  |     4 bytes   
  |  4 bytes   | 8 bytes |     2 bytes      |
+```
+#### Required attributes
+- **vector type**: Specifies the type of vectors stored (e.g., `FLOAT32`, 
`FLOAT64`); Now Kvrocks only supports `FLOAT64`. 
+- **dimension**: The dimensionality of the vectors (number of elements in each 
vector).
+- **distance metric**: Metric used for distance calculation between vectors 
(i.e. `L2`, `IP`, `COSINE`).
+
+#### Optional attributes
+- **initial cap**: Initial capacity of the HNSW graph, indicating the initial 
number of elements; Default is 500000. 
+- **m**: Maximum number of edges per node in the HNSW graph; Default is 16. 
+- **ef construction**: Size of the dynamic candidate list during the index 
construction phase; Default is 200. 
+- **ef runtime**: Size of the dynamic candidate list during the search phase; 
Default is 10. 
+- **epsilon**: Epsilon value for approximate search, controlling the trade-off 
between search precision and speed; Default is 0.01. 
+- **number of levels**: Number of levels in the HNSW graph, affecting the 
hierarchical structure of the graph.
+
 ## Index data encoding
 
 Index data refers to the information stored after indexing the real data,
@@ -112,3 +135,30 @@ which is used to quickly get corresponding data in 
subsequent query processes.
 
|-----------|---------|------------|------------|-----------------|------------|
  ->  |------------|
 | 1+X bytes | 1 byte  | 4+Y bytes  | 4+Z bytes  |    8 bytes      | 4+B bytes  
|  ->  |   0 byte   |
 ```
+
+### HNSW Vector field
+
+#### HNSW graph entry types
+
+|  hnsw type   | enum value  |
+|--------------|-------------|
+|  NODE        | 1           |
+|  EDGE        | 2           |
+
+#### HNSW node index encoding
+
+```
+| namespace | FIELD   | index name | field name |   level   |    hnsw type   | 
user key   |      | num of neighbours | vector dimension |      vector data     
 |
+|-----------|---------|------------|------------|-----------|----------------|------------|
  ->  |-------------------|------------------|-----------------------|
+| 1+X bytes | 1 byte  | 4+Y bytes  | 4+Z bytes  |  2 bytes  |  NODE (1 byte) | 
4+B bytes  |  ->  |       2 bytes     |      2 bytes     |  dimension * 8 bytes 
 |
+```
+
+#### HNSW edge index encoding
+
+```
+| namespace | FIELD   | index name | field name |   level   |    hnsw type   | 
user key 1 | user key 2 |      |    null    |
+|-----------|---------|------------|------------|-----------|----------------|------------|------------|
  ->  |------------|
+| 1+X bytes | 1 byte  | 4+Y bytes  | 4+Z bytes  |  2 bytes  |  EDGE (1 byte) | 
4+B bytes  | 4+B bytes  |  ->  |   0 byte   |
+```
+
+where *user key 1* and *user key 2* represent the endpoints of an edge at a 
specific level within the HNSW graph.

Reply via email to