[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6737: [HUDI-4373] Flink Consistent hashing bucket index write path code

GitBox Tue, 11 Oct 2022 01:11:32 -0700


YuweiXiao commented on code in PR #6737:
URL: https://github.com/apache/hudi/pull/6737#discussion_r991984275



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##########
@@ -316,9 +318,8 @@ public static DataStream<HoodieRecord> 
rowDataToHoodieRecord(Configuration conf,
   public static DataStream<Object> hoodieStreamWrite(Configuration conf, 
DataStream<HoodieRecord> dataStream) {
     if (OptionsResolver.isBucketIndexType(conf)) {
       WriteOperatorFactory<HoodieRecord> operatorFactory = 
BucketStreamWriteOperator.getFactory(conf);
-      int bucketNum = conf.getInteger(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS);
-      String indexKeyFields = conf.getString(FlinkOptions.INDEX_KEY_FIELD);
-      BucketIndexPartitioner<HoodieKey> partitioner = new 
BucketIndexPartitioner<>(bucketNum, indexKeyFields);
+      dataStream = addBucketBootstrapIfNecessary(conf, dataStream);
+      Partitioner<HoodieKey> partitioner = 
BucketIndexPartitioner.instance(conf);

Review Comment:
   Another way is to have a constant template for initial hashing metadata, 
e.g., fixed file group uuid. So that contention is ok since writers are 
creating metadata with the same content. Like how we handle 
`.hoodie_partition_metadata`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6737: [HUDI-4373] Flink Consistent hashing bucket index write path code

Reply via email to