This is an automated email from the ASF dual-hosted git repository.
twice pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/kvrocks-website.git
The following commit(s) were added to refs/heads/main by this push:
new 741b371 Add a new page for search encoding (#239)
741b371 is described below
commit 741b371cde63e7c2c6b933a2b39905bf40338bf8
Author: Twice <[email protected]>
AuthorDate: Thu Aug 1 21:48:15 2024 +0900
Add a new page for search encoding (#239)
---
community/kvrocks-search-index-encoding.md | 114 +++++++++++++++++++++++++++++
sidebarsCommunity.js | 1 +
2 files changed, 115 insertions(+)
diff --git a/community/kvrocks-search-index-encoding.md
b/community/kvrocks-search-index-encoding.md
new file mode 100644
index 0000000..0e5faba
--- /dev/null
+++ b/community/kvrocks-search-index-encoding.md
@@ -0,0 +1,114 @@
+# Index encoding format for Kvrocks Search
+
+Different from [the encoding method of other data
structures](https://kvrocks.apache.org/community/data-structure-on-rocksdb) in
Kvrocks (e.g. String, Hash, ZSet ...),
+Apache Kvrocks™ Search (a.k.a. Kvrocks Search) uses an independent column
family (named `search`)
+and a separately designed encoding format to store indexing-related metadata
and data.
+
+WARNING: Kvrocks Search is currently in development and has not been
officially released,
+so its encoding format may undergo breaking changes.
+
+## Common encoding
+
+### Key types
+
+| key type | enum value |
+|-------------|-------------|
+| INDEX_META | 0 |
+| PREFIXES | 1 |
+| FIELD_META | 2 |
+| FIELD | 3 |
+| FIELD_ALIAS | 4 |
+
+The common encoding format of key is as follows:
+```
++-------------+-------------+-------------+----------------+-------------+-------------------+
+| ns size | namespace | key type | idx name size | index name |
other fields... |
+| (1byte: X) | (Xbyte) | (1byte) | (4bytes: Y) | (Y bytes) |
(variable) |
++-------------+-------------+-------------+----------------+-------------+-------------------+
+```
+
+### Field types and flags
+
+| field type | enum value |
+|-------------|-------------|
+| tag | 1 |
+| numeric | 2 |
+
+The common encoding format of a *field flag* is:
+
+```
+| 8 bit |
+|----------------------------------------------------|
+| noindex: 1bit | field type: 4bit | reserved: 3bit |
+```
+
+## Metadata encoding
+
+In Kvrocks Search, metadata refers to the metadata of an index (also known as
a schema),
+including some properties of the index, which fields are included in this
index,
+what type each field is, and what properties they have.
+
+### Index metadata
+
+```
+| namespace | INDEX_META | index name | | index flag | on data type |
+|-----------|------------|------------| -> |------------|---------------|
+| 1+X bytes | 1 byte | 4+Y bytes | -> | 1 byte | 1 byte |
+```
+
+where *index flag* is currently 8-bit all reserved (equals to `0`), and *on
data type* is one of:
+
+| on data type | enum value |
+|--------------|-------------|
+| HASH | 2 |
+| JSON | 10 |
+
+### Index prefixes
+
+```
+| namespace | PREFIXES | index name | | prefix strings... |
+|-----------|------------|------------| -> |-------------------|
+| 1+X bytes | 1 byte | 4+Y bytes | -> | (4+Zi)*N bytes |
+```
+
+Index prefixes are used to determine which keys belong to the tracking scope
of this index.
+It consists of an array of strings, where each string is a key prefix.
+
+### Tag field metadata
+
+```
+| namespace | FIELD_META | index name | field name | | field flag |
separator | case sensitive |
+|-----------|------------|------------|------------| ->
|------------|------------|----------------|
+| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte | 1
byte | 1 byte |
+```
+
+where *separator* currently can only be an ASCII character, and case sensitive
can be `0` (false) or `1` (true).
+
+### Numeric field metadata
+
+```
+| namespace | FIELD_META | index name | field name | | field flag |
+|-----------|------------|------------|------------| -> |------------|
+| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte |
+```
+
+## Index data encoding
+
+Index data refers to the information stored after indexing the real data,
+which is used to quickly get corresponding data in subsequent query processes.
+
+### Tag field
+
+```
+| namespace | FIELD | index name | field name | tag value | user key |
| null |
+|-----------|---------|------------|------------|------------|------------|
-> |------------|
+| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 4+A bytes | 4+B bytes |
-> | 0 byte |
+```
+
+### Numeric field
+
+```
+| namespace | FIELD | index name | field name | floating number | user key
| | null |
+|-----------|---------|------------|------------|-----------------|------------|
-> |------------|
+| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 8 bytes | 4+B bytes
| -> | 0 byte |
+```
diff --git a/sidebarsCommunity.js b/sidebarsCommunity.js
index 0fdae38..7917dd3 100644
--- a/sidebarsCommunity.js
+++ b/sidebarsCommunity.js
@@ -12,6 +12,7 @@ const sidebars = {
},
items: [
'data-structure-on-rocksdb',
+ 'kvrocks-search-index-encoding',
]
},
{