This is an automated email from the ASF dual-hosted git repository. twice pushed a commit to branch search-encoding in repository https://gitbox.apache.org/repos/asf/kvrocks-website.git
commit ce50f5ed761cb70489f279a038b227d455070dbd Author: PragmaTwice <[email protected]> AuthorDate: Thu Aug 1 21:03:01 2024 +0900 Add a new page for search encoding --- community/kvrocks-search-index-encoding.md | 114 +++++++++++++++++++++++++++++ sidebarsCommunity.js | 1 + 2 files changed, 115 insertions(+) diff --git a/community/kvrocks-search-index-encoding.md b/community/kvrocks-search-index-encoding.md new file mode 100644 index 0000000..0e5faba --- /dev/null +++ b/community/kvrocks-search-index-encoding.md @@ -0,0 +1,114 @@ +# Index encoding format for Kvrocks Search + +Different from [the encoding method of other data structures](https://kvrocks.apache.org/community/data-structure-on-rocksdb) in Kvrocks (e.g. String, Hash, ZSet ...), +Apache Kvrocks™ Search (a.k.a. Kvrocks Search) uses an independent column family (named `search`) +and a separately designed encoding format to store indexing-related metadata and data. + +WARNING: Kvrocks Search is currently in development and has not been officially released, +so its encoding format may undergo breaking changes. + +## Common encoding + +### Key types + +| key type | enum value | +|-------------|-------------| +| INDEX_META | 0 | +| PREFIXES | 1 | +| FIELD_META | 2 | +| FIELD | 3 | +| FIELD_ALIAS | 4 | + +The common encoding format of key is as follows: +``` ++-------------+-------------+-------------+----------------+-------------+-------------------+ +| ns size | namespace | key type | idx name size | index name | other fields... | +| (1byte: X) | (Xbyte) | (1byte) | (4bytes: Y) | (Y bytes) | (variable) | ++-------------+-------------+-------------+----------------+-------------+-------------------+ +``` + +### Field types and flags + +| field type | enum value | +|-------------|-------------| +| tag | 1 | +| numeric | 2 | + +The common encoding format of a *field flag* is: + +``` +| 8 bit | +|----------------------------------------------------| +| noindex: 1bit | field type: 4bit | reserved: 3bit | +``` + +## Metadata encoding + +In Kvrocks Search, metadata refers to the metadata of an index (also known as a schema), +including some properties of the index, which fields are included in this index, +what type each field is, and what properties they have. + +### Index metadata + +``` +| namespace | INDEX_META | index name | | index flag | on data type | +|-----------|------------|------------| -> |------------|---------------| +| 1+X bytes | 1 byte | 4+Y bytes | -> | 1 byte | 1 byte | +``` + +where *index flag* is currently 8-bit all reserved (equals to `0`), and *on data type* is one of: + +| on data type | enum value | +|--------------|-------------| +| HASH | 2 | +| JSON | 10 | + +### Index prefixes + +``` +| namespace | PREFIXES | index name | | prefix strings... | +|-----------|------------|------------| -> |-------------------| +| 1+X bytes | 1 byte | 4+Y bytes | -> | (4+Zi)*N bytes | +``` + +Index prefixes are used to determine which keys belong to the tracking scope of this index. +It consists of an array of strings, where each string is a key prefix. + +### Tag field metadata + +``` +| namespace | FIELD_META | index name | field name | | field flag | separator | case sensitive | +|-----------|------------|------------|------------| -> |------------|------------|----------------| +| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte | 1 byte | 1 byte | +``` + +where *separator* currently can only be an ASCII character, and case sensitive can be `0` (false) or `1` (true). + +### Numeric field metadata + +``` +| namespace | FIELD_META | index name | field name | | field flag | +|-----------|------------|------------|------------| -> |------------| +| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte | +``` + +## Index data encoding + +Index data refers to the information stored after indexing the real data, +which is used to quickly get corresponding data in subsequent query processes. + +### Tag field + +``` +| namespace | FIELD | index name | field name | tag value | user key | | null | +|-----------|---------|------------|------------|------------|------------| -> |------------| +| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 4+A bytes | 4+B bytes | -> | 0 byte | +``` + +### Numeric field + +``` +| namespace | FIELD | index name | field name | floating number | user key | | null | +|-----------|---------|------------|------------|-----------------|------------| -> |------------| +| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 8 bytes | 4+B bytes | -> | 0 byte | +``` diff --git a/sidebarsCommunity.js b/sidebarsCommunity.js index 0fdae38..7917dd3 100644 --- a/sidebarsCommunity.js +++ b/sidebarsCommunity.js @@ -12,6 +12,7 @@ const sidebars = { }, items: [ 'data-structure-on-rocksdb', + 'kvrocks-search-index-encoding', ] }, {
