[GitHub] [hudi] stream2000 commented on a diff in pull request #9199: [HUDI-6534]Support consistent hashing row writer

via GitHub Tue, 01 Aug 2023 04:34:53 -0700


stream2000 commented on code in PR #9199:
URL: https://github.com/apache/hudi/pull/9199#discussion_r1280497250



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDConsistentBucketBulkInsertPartitioner.java:
##########
@@ -144,9 +152,20 @@ private Map<String, Map<String, Integer>> 
generateFileIdPfx(Map<String, Consiste
       }
       
partitionToFileIdPfxIdxMap.put(identifier.getMetadata().getPartitionPath(), 
fileIdPfxToIdx);
     }
-
     ValidationUtils.checkState(fileIdPfxList.size() == 
partitionToIdentifier.values().stream().mapToInt(ConsistentBucketIdentifier::getNumBuckets).sum(),
         "Error state after constructing fileId & idx mapping");
     return partitionToFileIdPfxIdxMap;
   }
+
+  @Override
+  public Option<WriteHandleFactory> getWriteHandleFactory(int idx) {
+    return super.getWriteHandleFactory(idx).map(writeHandleFactory -> new 
WriteHandleFactory() {
+      @Override
+      public HoodieWriteHandle create(HoodieWriteConfig config, String 
commitTime, HoodieTable hoodieTable, String partitionPath, String fileIdPrefix, 
TaskContextSupplier taskContextSupplier) {
+        // Ensure we do not create append handle for consistent hashing 
bulk_insert, align with `ConsistentBucketBulkInsertDataInternalWriterHelper`

Review Comment:
   For reviewers: When we bulk insert twice into a consistent hashing bucket 
index table, we need to write logs to existing file groups in the second bulk 
insert, while for normal bloom filter index table, we will always create new 
base files when bulk insert. However currently bulk insert row writer path do 
not support writing logs, so I add a check here to prevent user from bulk 
insert twice into a consistent hashing bucket index table. We should use upsert 
after first bulk insert for consistent hashing bucket index table. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] stream2000 commented on a diff in pull request #9199: [HUDI-6534]Support consistent hashing row writer

Reply via email to