vinothchandar commented on a change in pull request #2739:
URL: https://github.com/apache/hudi/pull/2739#discussion_r616194063
##########
File path: docs/_posts/2021-02-13-hudi-key-generators.md
##########
@@ -5,18 +5,16 @@ author: shivnarayan
category: blog
---
-Every record in Hudi is uniquely identified by a HoodieKey, which is a pair of
record key and partition path where the
-record belongs to. Hudi has imposed this constraint so that updates and
deletes can be applied to the record of interest.
-Hudi relies on the partition path field to partition your dataset and records
within a partition have unique record keys.
-Since uniqueness is guaranteed only within the partition, there could be
records with same record keys across different
-partitions. One should choose the partition field wisely as it could be a
determining factor for your ingestion and
-query latency.
+Every record in Hudi is uniquely identified by a primary key, which is a pair
of record key and partition path where
+the record belongs to. Using primary keys, Hudi can impose a) partition level
uniqueness integrity constraint
+b) enable fast updates and deletes on records. One should choose the
partitioning scheme wisely as it could be a
+determining factor for your ingestion and query latency.
## Key Generators
-Hudi exposes a number of out of the box key generators that customers can use
based on their need. Or can have their
-own implementation for the KeyGenerator. This blog goes over all different
types of key generators that are readily
-available to use.
+Hudi provides several key generators out of the box that customers can use
based on their need while having a pluggable
Review comment:
General tip: please avoid "customer". ๐. โUserโ is what we mean
##########
File path: docs/_posts/2021-02-13-hudi-key-generators.md
##########
@@ -5,18 +5,16 @@ author: shivnarayan
category: blog
---
-Every record in Hudi is uniquely identified by a HoodieKey, which is a pair of
record key and partition path where the
-record belongs to. Hudi has imposed this constraint so that updates and
deletes can be applied to the record of interest.
-Hudi relies on the partition path field to partition your dataset and records
within a partition have unique record keys.
-Since uniqueness is guaranteed only within the partition, there could be
records with same record keys across different
-partitions. One should choose the partition field wisely as it could be a
determining factor for your ingestion and
-query latency.
+Every record in Hudi is uniquely identified by a primary key, which is a pair
of record key and partition path where
Review comment:
Yes makes sense in the backdrop of column/secondary keys
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]