[GitHub] [hudi] YuweiXiao commented on a diff in pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

GitBox Tue, 10 May 2022 18:41:25 -0700


YuweiXiao commented on code in PR #4480:
URL: https://github.com/apache/hudi/pull/4480#discussion_r869799400



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java:
##########
@@ -216,19 +216,40 @@ public class HoodieIndexConfig extends HoodieConfig {
   /**
    * ***** Bucket Index Configs *****
    * Bucket Index is targeted to locate the record fast by hash in big data 
scenarios.
-   * The current implementation is a basic version, so there are some 
constraints:
-   * 1. Unsupported operation: bulk insert, cluster and so on.
-   * 2. Bucket num change requires rewriting the partition.
-   * 3. Predict the table size and future data growth well to set a reasonable 
bucket num.
-   * 4. A bucket size is recommended less than 3GB and avoid bing too small.
-   * more details and progress see [HUDI-3039].
-   */
-  // Bucket num equals file groups num in each partition.
-  // Bucket num can be set according to partition size and file group size.
+   * A bucket size is recommended less than 3GB to avoid being too small.

Review Comment:
   For consistent hashing, users can config the bucket split/merge threshold to 
control the bucket size. 3GB may not be a hard constraint. it is a 
recommendation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] YuweiXiao commented on a diff in pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

Reply via email to