This is an automated email from the ASF dual-hosted git repository.
hulk pushed a commit to branch unstable
in repository https://gitbox.apache.org/repos/asf/incubator-kvrocks.git
The following commit(s) were added to refs/heads/unstable by this push:
new 8e9b94e Remove docs in repo and apply apache license header for some
files (#740)
8e9b94e is described below
commit 8e9b94ed0b23e1ba7608f89665bf6cb56ac83d48
Author: hulk <[email protected]>
AuthorDate: Thu Jul 21 21:08:30 2022 +0800
Remove docs in repo and apply apache license header for some files (#740)
---
README.md | 19 ++++
docs/custom-api-sortedint.md | 58 -----------
docs/metadata-design.md | 186 ---------------------------------
docs/replication-design.md | 49 ---------
docs/source-code-overview.CN.md | 220 ----------------------------------------
src/version.h.in | 20 ++++
6 files changed, 39 insertions(+), 513 deletions(-)
diff --git a/README.md b/README.md
index 4c9a939..ace584b 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,22 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
<img src="docs/images/kvrocks_logo.png" alt="kvrocks_logo" width="350"/>
[](https://github.com/apache/incubator-kvrocks/actions/workflows/kvrocks.yaml)
diff --git a/docs/custom-api-sortedint.md b/docs/custom-api-sortedint.md
deleted file mode 100644
index 1876d8e..0000000
--- a/docs/custom-api-sortedint.md
+++ /dev/null
@@ -1,58 +0,0 @@
-# Custom api sortedint
-* The following example demostrate how to use sortedint to paginate id list
-```
-redis> SIADD mysi 1
-(integer) 1
-redis> SIADD mysi 2
-(integer) 1
-redis> SIADD mysi 2
-(integer) 0
-redis> SIADD mysi 3 4 5 123 245
-(integer) 5
-redis> SICARD mysi
-(integer) 7
-redis> SIREVRANGE mysi 0 3
-1) 245
-2) 123
-3) 5
-redis> SIREVRANGE mysi 0 3 cursor 5
-1) 4
-2) 3
-3) 2
-redis> SIRANGE mysi 0 3 cursor 123
-1) 245
-redis> SIRANGEBYVALUE mysi 1 (5
-1) "1"
-2) "2"
-3) "3"
-4) "4"
-redis> SIREVRANGEBYVALUE mysi 5 (1
-1) "5"
-2) "4"
-3) "3"
-4) "2"
-redis> SIEXISTS mysi 1 88 2
-1) 1
-2) 0
-3) 1
-redis> SIREM mysi 2
-(integer) 1
-redis>
-```
-* siexists key member1 (member2 ...)
-```
-Return value
-Array reply: list of integers at the specified keys, specifically:
- 1 if the key exists.
- 0 if the key does not exist.
-```
-```
-* sirangebyvalue key min max (LIMIT offset count)
-```
-like zrangebyscore.
-```
-```
-* sirevrangebyvalue key max min (LIMIT offset count)
-```
-like zrevrangebyscore.
-```
\ No newline at end of file
diff --git a/docs/metadata-design.md b/docs/metadata-design.md
deleted file mode 100644
index f7c9237..0000000
--- a/docs/metadata-design.md
+++ /dev/null
@@ -1,186 +0,0 @@
-# Design Complex Structure On Rocksdb
-
-kvrocks uses the rocksdb as storage, it's developed by Facebook which is built
on LevelDB with many extra features, like column family, transaction and
backup, see the rocksdb wiki: [Features Not In
LevelDB](https://github.com/facebook/rocksdb/wiki/Features-Not-in-LevelDB). The
basic operations in rocksdb are `Put(key, value)`, `Get(key)`, `Delete(key)`,
other complex structures aren't supported. The main goal of this doc is to
explain how we built the Redis hash/list/set/zset/bitmap on [...]
-
-## String
-
-Redis string is key-value with expire time, so it's very easy to translate the
Redis string into rocksdb key-value.
-
-```shell
- +----------+------------+--------------------+
-key => | flags | expire | payload |
- | (1byte) | (4byte) | (Nbyte) |
- +----------+------------+--------------------+
-```
-
-we prepend 1-byte `flags` and 4-bytes expire before the user's value:
-
-- `flags` is used to tell the kvrocks which type of this key-value, maybe
`string`/`hash`/`list`/`zset`/`bitmap`
-- `expire` stores the absolute time of key should be expired, zero means the
key-value would never expire
-- `payload` is the user's raw value
-
-## Hash
-
-Redis hashmap(dict) is like the hashmap in many programming languages, it is
used to implement an associative array abstract data type, a structure that can
map keys to values. The direct way to implement the hash in rocksdb is
serialized the keys/values into one value and store it like the string, but the
drawback is performance impact when the keys/values grew bigger. so we split
the hash sub keys/values into a single key-value in rocksdb, track it with
metadata.
-
-#### hash metadata
-
-```shell
- +----------+------------+-----------+-----------+
-key => | flags | expire | version | size |
- | (1byte) | (4byte) | (8byte) | (4byte) |
- +----------+------------+-----------+-----------+
-```
-
-the value of key we call it metadata here, it stored the metadata of hash key
includes:
-
-- `flags` like the string, the field shows what type this key is
-- `expire ` is the same as the string type, record the expiration time
-- `version` is used to accomplish fast delete when the number of sub
keys/values grew bigger
-- `size` records the number sub keys/values in this hash key
-
-#### hash sub keys-values
-
-we use extra keys-values to store the hash keys-values, the format is like
below:
-
-```shell
- +---------------+
-key|version|field => | value |
- +---------------+
-```
-
-we prepend the hash `key` and `version` before the hash field, the value of
`version` is from the metadata. For example, when the request `hget h1 f1` is
received, kvrocks fetches the metadata by hash key(here is `h1`), then
concatenate the hash key, version, field as new key, then fetches the value
with the new key.
-
-
-
-***Question1: why store version in the metadata***
-
-> we store the hash keys/values into a single key-value, if the store millions
of sub keys-values in one hash key. If user delete this key, the kvrocks must
iterator millions of sub keys-values and delete them, which would cause
performance problem. With version we can quickly delete the metadata and then
recycle the others keys-values in compaction background threads. The cost is
those tombstone keys would take some disk storage. You can regard the version
as an atomic increment number, [...]
-
-
-
-***Question2: what can we do if the user key is conflicted with the composed
key?***
-
-> we store the metadata key and composed key in different column families, so
it wouldn't happen.
-
-## Set
-
-Redis set can be regarded as a hash, with the value of sub-key always being
null, the metadata is the same with the one in hash:
-
-```shell
- +----------+------------+-----------+-----------+
-key => | flags | expire | version | size |
- | (1byte) | (4byte) | (8byte) | (4byte) |
- +----------+------------+-----------+-----------+
-```
-
-and the sub keys-values in rocksdb would be:
-
-```shell
- +---------------+
-key|version|member => | NULL |
- +---------------+
-```
-
-## List
-
-#### list metadata
-
-Redis list is also organized by metadata and sub keys-values, and sub key is
index instead of the user key. Metadata is like below:
-
-```shell
-
+----------+------------+-----------+-----------+-----------+-----------+
-key => | flags | expire | version | size | head | tail
|
- | (1byte) | (4byte) | (8byte) | (4byte) | (8byte) | (8byte)
|
-
+----------+------------+-----------+-----------+-----------+-----------+
-```
-
-- `head` is the starting position of the list head
-- `tail` is the stopping position of the list tail
-
-the meaning of other fields are the same as other types, just add extra
head/tail to record the boundary of the list.
-
-#### list sub keys-values
-
-the subkey in list is composed by list key, version and index, index is
calculated from metadata's head or tail. for example, when the user requests
the `rpush list elem`, kvrocks would fetch the metadata with list key, then
generate the subkey with list key, version and tail, simply increase the tail,
then write the metadata and subkey's value back to rocksdb.
-
-```shell
- +---------------+
-key|version|index => | value |
- +---------------+
-```
-
-## ZSet
-
-Redis zset is set with sorted property, so it's a little different from other
types. it must be able to search with the member, as well as to retrieve
members with score range.
-
-#### zset metadata
-
-the metadata of zset is still same with set, like below:
-
-```shell
- +----------+------------+-----------+-----------+
-key => | flags | expire | version | size |
- | (1byte) | (4byte) | (8byte) | (4byte) |
- +----------+------------+-----------+-----------+
-```
-
-#### zset sub keys-values
-
-The value of sub key isn't null, we need a way to range the members with the
score. So the zset has two types of sub keys-values, one for mapping the
members-scores, and one for score range.
-
-```shell
- +---------------+
-key|version|member => | score | (1)
- +---------------+
-
- +---------------+
-key|version|score|member => | NULL | (2)
- +---------------+
-```
-
-if the user wants to get the score of the member or check the member exists or
not, it would try the first one.
-
-## Bitmap
-
-Redis bitmap is the most interesting part in kvrocks design, unlike other
types, it's not subkey and the value would be very large if the user treats it
as a sparse array. It's apparent that the things would break down if we store
the bitmap into a single value, so we should break the bitmap value into
multiple fragments. Another behavior of bitmap is writing to arbitrary index,
it's very similar to the access model of the Linux virtual memory, so the idea
of the bitmap design came from that.
-
-#### bitmap metadata
-
-```shell
- +----------+------------+-----------+-----------+
-key => | flags | expire | version | size |
- | (1byte) | (4byte) | (8byte) | (4byte) |
- +----------+------------+-----------+-----------+
-```
-
-#### bitmap sub keys-values
-
-we break the bitmap values into fragments(1KiB, 8192 bits/fragment), and
subkey is the index of the fragment. for example, when the request to set the
bit of 1024 would locate in the first fragment with index 0, to set bit of
80970 would locate in 10th fragment with index 9.
-
-```shell
- +---------------+
-key|version|index => | fragment |
- +---------------+
-```
-
-when the user requests to get it of position P, kvrocks would first fetch the
metadata with bitmap's key and calculate the index of the fragment with bit
position, then fetch the bitmap fragment with composed key and find the bit in
fragment offset. For example, `getbit bitmap 8193`, the fragment index is `1`
(8193/8192) and subkey is `bitmap|1|1` (when the version is 1), then fetch the
subkey from rocksdb and check if the bit of offset `1`(8193%8192) is set or not.
-
-## Sortedint
-
-Sortedint is a set with members being type int and sorted in ascending order:
-
-```shell
- +----------+------------+-----------+-----------+
-key => | flags | expire | version | size |
- | (1byte) | (4byte) | (8byte) | (4byte) |
- +----------+------------+-----------+-----------+
-```
-
-and the sub keys-values in rocksdb would be:
-
-```shell
- +---------------+
-key|version|id => | NULL |
- +---------------+
-```
diff --git a/docs/replication-design.md b/docs/replication-design.md
deleted file mode 100644
index cb150dc..0000000
--- a/docs/replication-design.md
+++ /dev/null
@@ -1,49 +0,0 @@
-# Replication of rocksdb data
-
-A instance is turned into a slave role when `SLAVEOF` cmd is received. Slave
will
-try to do a partial synchronization (AKA. incremental replication) if it is
viable,
-Otherwise, slave will do a full-sync by copying all the rocksdb's latest
backup files.
-After the full-sync is finished, the slave's DB will be erased and restored
using
-the backup files downloaded from master, then partial-sync is triggered again.
-
-If everything go OK, the partial-sync is a ongoing procedure that keep
receiving
-every batch the master gets.
-
-## Replication State Machine
-
-A state machine is used in the slave's replication thread to accommodate the
complexity.
-
-On the slave side, replication is composed of the following steps:
-
- 1. Send Auth
- 2. Send db\_name to check if the master has the right DB
- 3. Try PSYNC, if succeeds, slave is in the loop of receiving batches; if
not, go to `4`
- 4. Do FULLSYNC
- 4.1. send _fetch_meta to get the latest backup meta data
- 4.2. send _fetch_file to get all the backup files listed in the meta
- 4.3. restore slave's DB using the backup
- 5. goto `1`
-
-## Partial Synchronization (PSYNC)
-
-PSYNC takes advantage of the rocksdb's WAL iterator. If the PSYNC's requesting
sequence
-number is in the range of the WAL files, PSYNC is considered viable.
-
-PSYNC is a command implemented on master role instance. Unlike other commands
(eg. GET),
-PSYNC cmd is not a REQ-RESP command, but a REQ-RESP-RESP style. That's the
response never
-ends once the req is accepted.
-
-so PSYNC has two main parts in the code:
-- A: libevent callback for sending the batches when the WAL iterator has new
data.
-- B: timer callback, when A quited because of the exhaustion of the WAL data,
timer cb
- will check if WAL has new data available from time to time, so to awake the
A again.
-
-## Full Synchronization
-
-On the master side, to support full synchronization, master must create a
rocksdb backup
-every time the `_fetch_meta` request is received.
-
-On the slave side, after retrieving the meta data, the slave can fetch every
file listed in
-the meta data (skip if already existed), and restore the backup. to accelerate
a bit, file
-fetching is executed in parallel.
-
diff --git a/docs/source-code-overview.CN.md b/docs/source-code-overview.CN.md
deleted file mode 100644
index 7d99355..0000000
--- a/docs/source-code-overview.CN.md
+++ /dev/null
@@ -1,220 +0,0 @@
-# 整体流程
-
-1. 加载 Kvrocks 配置(解析配置文件,构建 `Config` 对象)
-2. 初始化并打开存储引擎 `Engine::Storage`
-3. 初始化服务器 `Server`
-4. 运行 `Server`,执行 `Server::Start()` 和 `Server::Join()`
-5. 接收中断信号,终止 `Server`
-
- 调用路径: 中断信号处理函数 `signal_handler` -> `hup_handler` -> `Server::Stop()`
-
-# 存储引擎
-
-涉及到的文件:
-
-- storage.h, storage.cc
-
-## Engine::Storage
-
-封装底层存储引擎(目前使用的是 `RocksDB` )的接口, 为 Server 提供数据存储接口。
-涉及到的 `RocksDB` 类如下:
-
-- `rocksdb::DB`
-- `rocksdb::BackupEngine`
-- `rocksdb::Env`
-- `rocksdb::SstFileManager`
-- `rocksdb::RateLimiter`
-- `rocksdb::ColumnFamilyHandle`
-
-## Storage::Open
-
-1. 创建需要的 `ColumnFamily`
-2. 配置每个 `ColumnFamily`
-3. 调用 `rocksdb::DB::Open()` 打开 RocksDB,并统计用时
-4. 调用 `rocksdb::BackupEngine::Open()` 打开 `BackupEngine`
-
-# 服务器
-
-涉及到的文件:
-
-- server.h, server.cc
-- redis_cmd.cc
-- worker.h, worker.cc
-
-## Server 初始化
-
-1. 调用 `Redis::GetCommandList` 函数获得命令表,并初始化命令统计
-2. 创建 `Worker` 和 `WorkerThread`, 工作线程(可以配置数目),用于处理请求
-3. 构建主从复制 Worker (用于主从同步),可以配置数目,也可以设置限速
-
-## Server::Start()
-
-1. 启动 工作线程 和 复制线程, `WorkerThread.Start()`
-2. 启动 `TaskRunner`, 用于处理异步任务 `Task`
-3. 构建并启动一个 Cron 线程
-4. 构建并启动一个 `CompactionChecker` 线程,定时手动进行 `RocksDB` 的Compaction
-
-# 线程模型
-
-Kvrocks 包括 Worker 线程、TaskRunner 线程池、Cron 周期线程、CompactionChecker 线程、主从复制线程。
-
-## Worker线程
-
-### Worker
-
-Kvrocks 使用 `libevent` 库进行事件处理。
-
-涉及到的文件:
-
-- worker.h, worker.cc
-
-工作线程 `Worker` 和 `Redis::Connection` 绑定在一起,内部用 map 存储 fd 和 Connection
的映射关系,成员函数主要是连接相关的逻辑以及event相关的对象。
-
-初始化 `Worker`(构造函数):
-- `event_base_new()` 创建 `event_base`
-- 创建 `timer event` (默认10s检查一次)回调函数 `Worker::TimerCB()`
-- 调用 `listen()` 监听端口
- - 监听到事件之后,将返回的 `evconnlistener` 添加到监听事件列表 `listen_events_`上,回调函数是
`Worker::newConnection()`
-
-`Worker::TimerCB()`: 定时器回调函数
-- 检查超时的 client,从 `Worker` 维护的相关数据结构中踢出
-
-`Worker::Run()` :
-- `event_base_dispatch` 启动 `event_base` 的事件循环, 处理就绪的事件
-
-`Worker::newConnection()` :
-- 获取 `bufferevent`
-- 创建 `Redis::Connection(bev, worker)`
-- 设置 `bufferevent` 的读、写、事件三种回调函数,分别为:
`Redis::Connection::OnRead()`、`Redis::Connection::OnWrite()`、`Redis::Connection::OnEvent()`
-- 将 `Redis::Connection` 添加到此 `Worker` 的 `map<int, Redis::Connection*> conns_` 中
-- 设置复制 `Worker` 的限速
-
-### WorkerThread
-
-涉及到的文件:
-
-- worker.h, worker.cc
-
-`WorkerThread`: 将 thread 和 Worker 封装在一起
-
-`WorkerThread.Start()`:
-
-1. 构建并启动 thread
-2. thread 中执行 `Worker::Run()`
-
-### Redis::Connection
-
-涉及到的文件:
-
-- redis_connection.h, redis_connection.cc
-
-将客户端的连接抽象为 `Connection`,并将一系列操作封装其中,内部使用 `libevent` 的 `eventbuffer` 做数据的读取和写入。
-
-bufferevent
-
-每个连接的socket上面会有数据,数据将存在 bufferevent 的缓冲区上,对于 bufferevent 的三种回调函数:
-- 当输入缓冲区的数据大于等于输入低水位时,读取回调就会被调用。默认情况下,输入低水位的值是 0,也就是说,只要 socket 变得可读,就会调用读取回调
-- 当输出缓冲区的数据小于等于输出低水位时,写入回调就会被调用。默认情况下,输出低水位的值是
0,也就是说,只有当输出缓冲区的数据都发送完了,才会调用写入回调。因此,默认情况下的写入回调也可以理解成为写完成了,就会调用
-- 连接关闭、连接超时或者连接发生错误时,则会调用事件回调
-
-`Connection::OnRead()`: 读取数据,查找对应命令列表,然后执行
-- 调用 `Connection::Input()`, 读取 bufferevent 中的内容
-- 调用 `Request::Tokenize` 将命令解析成 Token 保存在 `Request` 内部
-- 调用 `Connection::ExecuteCommands`,执行命令
-
-`Connection::OnWrite()`: 回复客户端完毕
-- `Connection::Close()`,内部调用 `Worker::FreeConnection`
-
-`Connection::OnEvent()`: 处理出错、连接关闭、超时情况
-
-`Connection::ExecuteCommands()`:
-- 调用 `Server::LookupAndCreateCommand` 查找命令表获得命令
-- 判断命令是否合法
-- 如果命令合法,判断命令的参数是否合法
- - 数目是否合法
- - 参数类型(等其他方面是否合法),调用每个命令的 Parse 函数(每个命令都会重写基类 `Commander` 的 `Parse` 函数)
-- 调用当前命令的 `Execute` 函数执行当前命令,获得回复字符串
-- 统计命令的执行时间
-- 处理 `monitor` 命令的逻辑: 调用 `Server::FeedMonitorConns`,将当前命令发送给 Monitor 客户端连接
-- `Connection::Reply()`: 调用 `Redis::Reply()` 将响应写入 `bufferevent` 回复给客户端
-
-### Redis::Request
-
-涉及到的文件:
-
-- redis_request.h, redis_request.cc
-
-主要用来解析 eventbuffer 中的数据, 解析成 Redis 命令,并执行
-
-`Request::Tokenize()`:
-
-将客户端传来的数据从 `eventbuffer` 中读出,分隔成 Token
-
-## TaskRunner 线程池
-
-涉及到的文件:
-
-- task_runner.h, task_runner.cc
-
-是一个线程池,有任务队列,用来存储异步的任务( Task ),当前异步的任务有:
-
-- `Server::AsyncCompactDB()`: `compact` 命令、`Server::cron` 调用
-- `Server::AsyncBgsaveDB()`: `bgsave` 命令、`Server::cron` 调用
-- `Server::AsyncPurgeOldBackups()`: `flushbackup` 命令、`Server::cron` 调用
-- `Server::AsyncScanDBSize()`: `dbsize` 命令调用
-
-可以发现这些都是比较耗时的任务, 为了不阻塞其他请求。
-
-TaskRunner::Start():
-- 创建线程执行 TaskRunner::run
-- TaskRunner::run 无限循环, 执行队列中的 Task
-
-## Cron 周期线程
-
-相关文件:
-- server.h, server.cc
-
-Server的周期函数,执行一些定时任务(100ms是一个时钟嘀嗒):
-- `AsyncCompactDB` 周期 20s
-- `AsyncBgsaveDB` 周期 20s
-- `AsyncPurgeOldBackups` 周期 1min
-- `autoResizeBlockAndSST` 动态改变RocksDB的参数 `target_file_size_base` 和
`write_buffer_size` 的大小,周期 30min
-- `Server::cleanupExitedSlaves()`
-
-## CompactionChecker 清理线程
-
-相关文件:
-- compaction_checker.h, compaction_checker.cc
-
-每 1min 检查一次, `CompactionChecker` 中有 `CompactPubsubAndSlotFiles` 和
-`PickCompactionFiles` 两个函数:
-- `CompactPubsubAndSlotFiles`: 清理 pubsub 相关的 ColumnFamily
-- `PickCompactionFiles`: 获取 SST 文件的 `TableProperties`,其中包含了 SST 的属性:
-
- 1. key 的总数目
- 2. 删除 key 的数目
- 3. 起始 key
- 4. 终止 key
-
- 对满足以下条件的 SST 文件进行手动 Compaction:
-
- 1. 创建超过两天的 SST 文件
- 2. 删除key占比多的 SST 文件
-
-获取 SST 属性
-
-`CompactOnExpiredCollector` 通过继承 `rocksdb::TablePropertiesCollector` 实现自定义 SST
属性,然后实现相应的工厂类 `CompactOnExpiredTableCollectorFactory`,将工厂类通过
`rocksdb::ColumnFamilyOptions` 传递给存储引擎。
-
-## 主从复制线程
-
-执行主从复制相关的逻辑,具体见: [replication-design](./replication-design.md)
-
-# 命令实现
-
-## 编码
-
-将Redis相关命令编码成KV数据,具体见: [metadata-design](./metadata-design.md)
-
-## 实现
-
-按照编码规则,使用 redis_xx.h 中定义的数据结构,构造编码后的KV数据,最后使用存储引擎 `Engine::Storage`
封装的接口将最终KV数据保存。
diff --git a/src/version.h.in b/src/version.h.in
index 2a46b8d..ffc9f34 100644
--- a/src/version.h.in
+++ b/src/version.h.in
@@ -1,3 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+
#pragma once
#define VERSION "@PROJECT_VERSION@"