This is an automated email from the ASF dual-hosted git repository.
laiyingchun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pegasus-website.git
The following commit(s) were added to refs/heads/master by this push:
new ec20272e Update data-model docs (#45)
ec20272e is described below
commit ec20272e40abf7ef4e1b000eb00d0c19ae1d519e
Author: Yingchun Lai <[email protected]>
AuthorDate: Mon Dec 18 21:06:37 2023 +0800
Update data-model docs (#45)
---
_overview/en/data-model.md | 44 +++++++++++++++++++++++++++++++++++++++++++-
_overview/zh/data-model.md | 14 ++++++--------
2 files changed, 49 insertions(+), 9 deletions(-)
diff --git a/_overview/en/data-model.md b/_overview/en/data-model.md
index e43423bc..d9a63a5c 100644
--- a/_overview/en/data-model.md
+++ b/_overview/en/data-model.md
@@ -2,4 +2,46 @@
permalink: /overview/data-model/
---
-TRANSLATING
+## Introduction
+
+The data model of Pegasus is a simple Key-Value model, it does not support
complex schemas. However, to enhance its expressive power, Key is split into
**HashKey** and **SortKey**, namely composite key (`[HashKey, SortKey]
->Value`), which is similar to [DynamoDB](https://aws.amazon.com/dynamodb/)'s
[composite primary
key](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks.corecomponents.html#howitworks.corecomponents.primarykey).
+
+### HashKey
+
+Byte string. Similar to the partition key in DynamoDB, HashKey is used to
calculate which partition (a.k.a. shard) the data belongs to. Pegasus uses a
specific hash function to calculate the hash value for a HashKey, and then
modulo the number of partitions to obtain the **Partition ID** for the data.
Therefore, data with the same HashKey is always stored in the same partition.
+
+> Note:
+> On the C++ client side, the HashKey length limit is 64KB.
+> On the Java client side, if
[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12)
is enabled, then the limit is 1KB.
+> On the server side, since Pegasus 2.0.0, if
`[replication]max_allowed_write_size` is set as non-zero, limit the size of the
entire request packet to this value, defaulting to 1MB.
+
+### SortKey
+
+Byte string. Similar to the sort key in DynamoDB, SortKey is used for sorting
data within a partition. In fact, when storing data internally in RocksDB, we
concatenate HashKey and SortKey as the keys of RocksDB.
+> Note:
+> On the C++ client side, there is no limit to the length of SortKey.
+> On the Java client side, if
[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12)
is enabled, then the limit is 1KB.
+> On the server side, since Pegasus 2.0.0, if
`[replication]max_allowed_write_size` is set as non-zero, limit the size of the
entire request packet to this value, defaulting to 1MB.
+
+### Value
+
+Byte string.
+> Note:
+> On the C++ client side, there is no limit to the length of the Value.
+> On the Java client side, if
[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12)
is enabled, then the limit is 400KB.
+> On the server side, since Pegasus 2.0.0, if
`[replication]max_allowed_write_size` is set as non-zero, limit the size of the
entire request packet to this value, defaulting to 1MB.
+
+{:class="img-responsive
docs-image"}
+
+## Pegasus vs. HBase
+
+Although Pegasus is not as semantically rich as HBase's tabular model, it can
still meet most applications' needs, thanks to its HashKey+SortKey combination
key design.
+For example, users can treat HashKey as a row key and SortKey as an attribute
name or column name, so that multiple data of the same HashKey can be viewed as
one row, which can also express the concept of row in HBase.
+Taking this into consideration, Pegasus not only provides the
`get`/`set`/`del` interface for accessing individual data, but also provides
the `multi_get`/`multi_set`/`multi_del` interfaces for accessing batch data in
the same HashKey, and these interfaces provide single line atomic semantics,
making it convenient for users to use.
+
+{:class="img-responsive
docs-image"}
+
+## Pegasus vs. Redis
+
+Although Pegasus does not support rich data structures such as
`List`/`Set`/`Hash` like Redis, users can still use Pegasus to implement
similar semantics.
+For example, users can equate HashKey with Redis' `key` and use SortKey as the
`field` of Hash (or `member` of Set) to implement Hash in Redis.
diff --git a/_overview/zh/data-model.md b/_overview/zh/data-model.md
index a70e1ff0..631dd5e2 100644
--- a/_overview/zh/data-model.md
+++ b/_overview/zh/data-model.md
@@ -2,23 +2,21 @@
permalink: /overview/data-model/
---
-## 数据模型介绍
+## 介绍
-Pegasus 的数据模型非常简单,就是 Key-Value 模型,不支持复杂的 Schema。但是为了增强其表达能力,Key被分裂为
**HashKey** 和 **SortKey**,即组合键(composite key, `[HashKey, SortKey] -> Value`),这与
[DynamoDB](https://aws.amazon.com/dynamodb/) 中提供的 [_composite primary
key_](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey)(partition
key and sort key)是很类似的。
-
-这样设计的原因是:
-* Pegasus系统采用基于 Hash 的固定分片,必须通过一个方式计算数据的分片ID。最简单的办法就是让用户提供一个
HashKey,然后通过hash函数计算获得。
-* 简单的 `HashKey -> Value` 方式,在表达能力上又偏弱,不方便业务使用。
+Pegasus 的数据模型非常简单,就是 Key-Value 模型,不支持复杂的 Schema。但是为了增强其表达能力,Key被分裂为
**HashKey** 和 **SortKey**,即组合键(composite key, `[HashKey, SortKey] -> Value`),这与
[DynamoDB](https://aws.amazon.com/dynamodb/) 中的 [composite primary
key](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey)(partition
key and sort key)是类似的。
### HashKey
字节串。类似于 DynamoDB 中的 partition key,HashKey 用于计算数据属于哪个分片。Pegasus 使用一个特定的 hash
函数,对HashKey 计算出一个hash值,然后对分片个数取模,就得到该数据对应的 **Partition ID** 。因此,HashKey
相同的数据总是存储在同一个分片中。
->
注意:在C++客户端侧,HashKey长度限制为64KB。在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
+> 注意:
+> 在C++客户端侧,HashKey长度限制为64KB。
+>
在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
> 在Server侧,从Pegasus 2.0.0开始,如果设置 `[replication]max_allowed_write_size`
> 为非0,则限制整个请求包的大小为该值,默认为1MB。
### SortKey
-字节串。类似于 DynamoDB 中的 sort key,SortKey 用于数据在分片内的排序。HashKey 相同的数据放在一起,并且按照
SortKey 的字节序排序。实际上,在内部存储到RocksDB时,我们将 HashKey 和 SortKey 拼在一起作为 RocksDB 的 key。
+字节串。类似于 DynamoDB 中的 sort key,SortKey 用于数据在分片内的排序。实际上,在内部存储到RocksDB时,我们将
HashKey 和 SortKey 拼在一起作为 RocksDB 的 key。
> 注意:在C++客户端侧,SortKey长度无限制。在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
> 在Server侧,从Pegasus 2.0.0开始,如果设置 `[replication]max_allowed_write_size`
> 为非0,则限制整个请求包的大小为该值,默认为1MB。
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]