This is an automated email from the ASF dual-hosted git repository.
twice pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/kvrocks-website.git
The following commit(s) were added to refs/heads/main by this push:
new 7b5d9cf1 Add tdigest encoding details for documentation. (#274)
7b5d9cf1 is described below
commit 7b5d9cf124857f4dfa31966692869cd75142f9d4
Author: Edward Xu <[email protected]>
AuthorDate: Sun Mar 9 21:50:42 2025 +0800
Add tdigest encoding details for documentation. (#274)
---
README.md | 2 +-
community/data-structure-on-rocksdb.md | 52 ++++++++++++++++++++++++++++++++++
docs/supported-commands.md | 3 ++
3 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 2af37390..1f5100b9 100644
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ This project keeps all sources used for building the Apache
Kvrocks official web
* Submit [an issue](https://github.com/apache/kvrocks/issues/new) on the [main
repo](http://github.com/apache/kvrocks)
* Send an email to the [dev mailing list](mailto:[email protected])
([subscribe](mailto:[email protected]))
-* Ask on the [#general channel on Kvrocks
Slack](https://kvrocks.slack.com/archives/general)
([join](https://join.slack.com/t/kvrockscommunity/shared_invite/zt-p5928e3r-OUAK8SUgC8GOceGM6dAz6w))
+* [Chat on Zulip](https://kvrocks.zulipchat.com/)
## Prerequisite
diff --git a/community/data-structure-on-rocksdb.md
b/community/data-structure-on-rocksdb.md
index b0009827..d7b86f0e 100644
--- a/community/data-structure-on-rocksdb.md
+++ b/community/data-structure-on-rocksdb.md
@@ -47,6 +47,7 @@ The values encoded for other data types in flags can be found
in the table below
| BloomFilter| 9 |
| JSON | 10 |
| Hyperloglog| 11 |
+| TDigest | 12 |
In the encoding version `0`, `expire` is stored in seconds and as a 4byte
field (32bit integer), `size` is stored as also a 4byte field (32bit integer);
while in the encoding version `1`, `expire` is stored in milliseconds and as a
8byte field (64bit integer), `size` is stored as also a 8byte field (64bit
integer).
@@ -401,3 +402,54 @@ The register index is calculated using the first 14 bits
of the user element's h
The length of consecutive zeros is calculated using the last 50 digits of the
hash value of the user key.
Inspired by the bitmap implementation, HyperLogLog divides the register array
into 16 segments, each with 1024 registers.
+
+## TDigest
+
+[Redis TDigest](https://redis.io/blog/t-digest-in-redis-stack/) is a data
structure that estimates the quantiles of a dataset.
+It could be used for monitoring, online gaming, predictive analytics, and
other applications that require real-time data analysis.
+
+[The original paper](https://arxiv.org/abs/1902.04023) is based on the idea of
clustering data points into centroids, which are then used to estimate the
quantiles of the dataset.
+
+The key points of TDigest are parts below:
+
+1. Scale function: the scale function determines the distribution of the
centroids. For a certain scale, one parameter called compression will change
the density of the centroids.
+2. Merge: when the number of centroids exceeds a certain threshold, the
TDigest will merge the centroids to reduce the memory usage and let it satisfy
the compression parameter after merging. The merge could also be used to merge
two TDigests.
+3. Quantile estimation: when try to calculate the quantile, the TDigest will
find the nearest two centroids and interpolate the quantile value.
+
+In Kvrocks, the TDigest data structure is stored in following three parts:
+
+1. TDigest metadata
+2. TDigest buffer of data for batch inserting and merging into centroids
+3. TDigest centroids
+
+#### TDigest metadata
+
+```text
+
+----------+------------+-----------+----------------+----------------+-----------------+-----------------+---------------+----------------+----------------+--------------------+--------------+
+ key => | flags | expire | version | compression | capacity
| unmerged_nodes | merged_nodes | total_weight | minimum |
maximum | total_observations | merge_times |
+ | (1byte) | (Ebyte) | (8byte) | uint32(4byte) |
uint32(4byte) | uint64(8byte) | uint64(8byte) | uint64(8byte) |
double(8byte) | double(8byte) | uint64(8byte) | uint64(8byte)|
+
+----------+------------+-----------+----------------+----------------+-----------------+-----------------+---------------+----------------+----------------+--------------------+--------------+
+```
+
+#### TDigest buffer sub keys-values
+
+`buffer` is serialized as a list of double values.
+
+```text
+ +--------------+
+ key|version|BUFFER_FLAG => | buffer |
+ +--------------+
+```
+
+#### TDigest centroids sub keys-values
+
+`mean` is the mean of each centroid and use [memcomparable
serialization](https://github.com/apache/kvrocks/blob/80168018b6fdbc0bdd549cf78cd5798556eb070f/src/common/encoding.cc#L32)
to make iteration keep the original order of mean.
+
+During each merge, we will flush the buffer to the centroids and merge the
centroids to satisfy the compression parameter.
+
+```text
+ +----------------+
+ key|version|CENTROID_FLAG|mean => | weight |
+ | (8byte) double |
+ +----------------+
+```
diff --git a/docs/supported-commands.md b/docs/supported-commands.md
index c6dfd651..c95edf56 100644
--- a/docs/supported-commands.md
+++ b/docs/supported-commands.md
@@ -439,3 +439,6 @@ In addition, `LISTFUNC` subcommand is added as an extension
to list all function
| ------------------- | ---------------- | ------------- |
---------------------------------------------------------- |
| TDIGEST.CREATE | ✓ | unstable |
|
| TDIGEST.INFO | ✓ | unstable |
|
+| TDIGEST.ADD | ✓ | unstable |
|
+| TDIGEST.MIN | ✓ | unstable |
|
+| TDIGEST.MAX | ✓ | unstable |
|