[hudi] branch asf-site updated: [MINOR] Fixing key generators blog content (#2739)

sivabalan Tue, 20 Apr 2021 09:17:51 -0700

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 860abd0  [MINOR] Fixing key generators blog content (#2739)
860abd0 is described below

commit 860abd04cbb3e78265ba9a300bb0cd849fff7e44
Author: Sivabalan Narayanan <[email protected]>
AuthorDate: Tue Apr 20 12:17:30 2021 -0400

    [MINOR] Fixing key generators blog content (#2739)
---
 docs/_posts/2021-02-13-hudi-key-generators.md | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/docs/_posts/2021-02-13-hudi-key-generators.md 
b/docs/_posts/2021-02-13-hudi-key-generators.md
index 5076ec6..fc3faa8 100644
--- a/docs/_posts/2021-02-13-hudi-key-generators.md
+++ b/docs/_posts/2021-02-13-hudi-key-generators.md
@@ -5,18 +5,21 @@ author: shivnarayan
 category: blog
 ---
 
-Every record in Hudi is uniquely identified by a HoodieKey, which is a pair of 
record key and partition path where the 
-record belongs to. Hudi has imposed this constraint so that updates and 
deletes can be applied to the record of interest. 
-Hudi relies on the partition path field to partition your dataset and records 
within a partition have unique record keys. 
-Since uniqueness is guaranteed only within the partition, there could be 
records with same record keys across different 
-partitions. One should choose the partition field wisely as it could be a 
determining factor for your ingestion and 
-query latency.
+Every record in Hudi is uniquely identified by a primary key, which is a pair 
of record key and partition path where
+the record belongs to. Using primary keys, Hudi can impose a) partition level 
uniqueness integrity constraint
+b) enable fast updates and deletes on records. One should choose the 
partitioning scheme wisely as it could be a
+determining factor for your ingestion and query latency.
+
+In general, Hudi supports both partitioned and global indexes. For a dataset 
with partitioned index(which is most
+commonly used), each record is uniquely identified by a pair of record key and 
partition path. But for a dataset with
+global index, each record is uniquely identified by just the record key. There 
won't be any duplicate record keys across
+partitions.
 
 ## Key Generators
 
-Hudi exposes a number of out of the box key generators that customers can use 
based on their need. Or can have their 
-own implementation for the KeyGenerator. This blog goes over all different 
types of key generators that are readily 
-available to use.
+Hudi provides several key generators out of the box that users can use based 
on their need, while having a pluggable
+implementation for users to implement and use their own KeyGenerator. This 
blog goes over all different types of key 
+generators that are readily available to use.
 
 
[Here](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenerator.java)
 is the interface for KeyGenerator in Hudi for your reference.

[hudi] branch asf-site updated: [MINOR] Fixing key generators blog content (#2739)

Reply via email to