Re: [PR] [HUDI-8678] feat: improve consistent-bucket resizing performance by reducing unnecessary record collecting [hudi]

via GitHub Thu, 12 Dec 2024 00:17:22 -0800


TheR1sing3un commented on code in PR #12451:
URL: https://github.com/apache/hudi/pull/12451#discussion_r1881573645



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/ConsistentHashingBucketInsertPartitioner.java:
##########
@@ -29,5 +29,14 @@ public interface ConsistentHashingBucketInsertPartitioner {
    * @param partition partition to set Consistent Hashing nodes
    * @param nodes     nodes from clustering plan
    */
+  /**
+   * Set pending consistent hashing for partition, only used in executing 
clustering
+   * When call this method, the bulk insert will directly use the pending 
metadata as the consistent hash metadata for writing data to after-resizing 
buckets.
+   * Used in the case of executing bulk insert.
+   * NOTE: This method should be called before the bulk insert operation, and 
will skip building identifiers from records, just use the pending metadata.
+   * For which not calling this method, the bulk insert will use the committed 
metadata as the bucket metadata and disallow writing data to the 
pending-resizing buckets.
+   * @param partition partition to set Consistent Hashing nodes
+   * @param nodes     nodes from clustering plan
+   */

Review Comment:
   > Then shall we move the instantiation of the metadata once in the 
constructor? Not sure if there are some SE/DE issue.
   
   What you means is that we remove the method: 
`ConsistentHashBucketInsertPartitioner::addHashingChildrenNodes`?  And  then in 
the clustering scenario, the partitioner is instantiated directly by carrying 
these metadata as construct parameters?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8678] feat: improve consistent-bucket resizing performance by reducing unnecessary record collecting [hudi]

Reply via email to