This is an automated email from the ASF dual-hosted git repository.

laiyingchun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pegasus-website.git


The following commit(s) were added to refs/heads/master by this push:
     new ec20272e Update data-model docs (#45)
ec20272e is described below

commit ec20272e40abf7ef4e1b000eb00d0c19ae1d519e
Author: Yingchun Lai <[email protected]>
AuthorDate: Mon Dec 18 21:06:37 2023 +0800

    Update data-model docs (#45)
---
 _overview/en/data-model.md | 44 +++++++++++++++++++++++++++++++++++++++++++-
 _overview/zh/data-model.md | 14 ++++++--------
 2 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/_overview/en/data-model.md b/_overview/en/data-model.md
index e43423bc..d9a63a5c 100644
--- a/_overview/en/data-model.md
+++ b/_overview/en/data-model.md
@@ -2,4 +2,46 @@
 permalink: /overview/data-model/
 ---
 
-TRANSLATING
+## Introduction
+
+The data model of Pegasus is a simple Key-Value model, it does not support 
complex schemas. However, to enhance its expressive power, Key is split into 
**HashKey** and **SortKey**, namely composite key (`[HashKey, SortKey] 
->Value`), which is similar to [DynamoDB](https://aws.amazon.com/dynamodb/)'s 
[composite primary 
key](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks.corecomponents.html#howitworks.corecomponents.primarykey).
+
+### HashKey
+
+Byte string. Similar to the partition key in DynamoDB, HashKey is used to 
calculate which partition (a.k.a. shard) the data belongs to. Pegasus uses a 
specific hash function to calculate the hash value for a HashKey, and then 
modulo the number of partitions to obtain the **Partition ID** for the data. 
Therefore, data with the same HashKey is always stored in the same partition.
+
+> Note:
+> On the C++ client side, the HashKey length limit is 64KB.
+> On the Java client side, if 
[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12)
 is enabled, then the limit is 1KB.
+> On the server side, since Pegasus 2.0.0, if 
`[replication]max_allowed_write_size` is set as non-zero, limit the size of the 
entire request packet to this value, defaulting to 1MB.
+
+### SortKey
+
+Byte string. Similar to the sort key in DynamoDB, SortKey is used for sorting 
data within a partition. In fact, when storing data internally in RocksDB, we 
concatenate HashKey and SortKey as the keys of RocksDB.
+> Note:
+> On the C++ client side, there is no limit to the length of SortKey.
+> On the Java client side, if 
[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12)
 is enabled, then the limit is 1KB.
+> On the server side, since Pegasus 2.0.0, if 
`[replication]max_allowed_write_size` is set as non-zero, limit the size of the 
entire request packet to this value, defaulting to 1MB.
+
+### Value
+
+Byte string.
+> Note:
+> On the C++ client side, there is no limit to the length of the Value.
+> On the Java client side, if 
[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12)
 is enabled, then the limit is 400KB.
+> On the server side, since Pegasus 2.0.0, if 
`[replication]max_allowed_write_size` is set as non-zero, limit the size of the 
entire request packet to this value, defaulting to 1MB.
+
+![pegasus-data-model](/assets/images/pegasus-data-model.png){:class="img-responsive
 docs-image"}
+
+## Pegasus vs. HBase
+
+Although Pegasus is not as semantically rich as HBase's tabular model, it can 
still meet most applications' needs, thanks to its HashKey+SortKey combination 
key design.
+For example, users can treat HashKey as a row key and SortKey as an attribute 
name or column name, so that multiple data of the same HashKey can be viewed as 
one row, which can also express the concept of row in HBase.
+Taking this into consideration, Pegasus not only provides the 
`get`/`set`/`del` interface for accessing individual data, but also provides 
the `multi_get`/`multi_set`/`multi_del` interfaces for accessing batch data in 
the same HashKey, and these interfaces provide single line atomic semantics, 
making it convenient for users to use.
+
+![pegasus-data-model](/assets/images/pegasus-data-model-sample.png){:class="img-responsive
 docs-image"}
+
+## Pegasus vs. Redis
+
+Although Pegasus does not support rich data structures such as 
`List`/`Set`/`Hash` like Redis, users can still use Pegasus to implement 
similar semantics.
+For example, users can equate HashKey with Redis' `key` and use SortKey as the 
`field` of Hash (or `member` of Set) to implement Hash in Redis.
diff --git a/_overview/zh/data-model.md b/_overview/zh/data-model.md
index a70e1ff0..631dd5e2 100644
--- a/_overview/zh/data-model.md
+++ b/_overview/zh/data-model.md
@@ -2,23 +2,21 @@
 permalink: /overview/data-model/
 ---
 
-## 数据模型介绍
+## 介绍
 
-Pegasus 的数据模型非常简单,就是 Key-Value 模型,不支持复杂的 Schema。但是为了增强其表达能力,Key被分裂为 
**HashKey** 和 **SortKey**,即组合键(composite key, `[HashKey, SortKey] -> Value`),这与 
[DynamoDB](https://aws.amazon.com/dynamodb/) 中提供的 [_composite primary 
key_](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey)(partition
 key and sort key)是很类似的。
-
-这样设计的原因是:
-* Pegasus系统采用基于 Hash 的固定分片,必须通过一个方式计算数据的分片ID。最简单的办法就是让用户提供一个 
HashKey,然后通过hash函数计算获得。
-* 简单的 `HashKey -> Value` 方式,在表达能力上又偏弱,不方便业务使用。
+Pegasus 的数据模型非常简单,就是 Key-Value 模型,不支持复杂的 Schema。但是为了增强其表达能力,Key被分裂为 
**HashKey** 和 **SortKey**,即组合键(composite key, `[HashKey, SortKey] -> Value`),这与 
[DynamoDB](https://aws.amazon.com/dynamodb/) 中的 [composite primary 
key](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey)(partition
 key and sort key)是类似的。
 
 ### HashKey
 
 字节串。类似于 DynamoDB 中的 partition key,HashKey 用于计算数据属于哪个分片。Pegasus 使用一个特定的 hash 
函数,对HashKey 计算出一个hash值,然后对分片个数取模,就得到该数据对应的 **Partition ID** 。因此,HashKey 
相同的数据总是存储在同一个分片中。
-> 
注意:在C++客户端侧,HashKey长度限制为64KB。在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
+> 注意:
+> 在C++客户端侧,HashKey长度限制为64KB。
+> 
在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
 > 在Server侧,从Pegasus 2.0.0开始,如果设置 `[replication]max_allowed_write_size` 
 > 为非0,则限制整个请求包的大小为该值,默认为1MB。
 
 ### SortKey
 
-字节串。类似于 DynamoDB 中的 sort key,SortKey 用于数据在分片内的排序。HashKey 相同的数据放在一起,并且按照 
SortKey 的字节序排序。实际上,在内部存储到RocksDB时,我们将 HashKey 和 SortKey 拼在一起作为 RocksDB 的 key。
+字节串。类似于 DynamoDB 中的 sort key,SortKey 用于数据在分片内的排序。实际上,在内部存储到RocksDB时,我们将 
HashKey 和 SortKey 拼在一起作为 RocksDB 的 key。
 > 注意:在C++客户端侧,SortKey长度无限制。在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
 > 在Server侧,从Pegasus 2.0.0开始,如果设置 `[replication]max_allowed_write_size` 
 > 为非0,则限制整个请求包的大小为该值,默认为1MB。
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to