xicm commented on code in PR #12101:
URL: https://github.com/apache/hudi/pull/12101#discussion_r1802253444


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestSparkNonBlockingConcurrencyControl.java:
##########
@@ -213,6 +215,23 @@ public void 
testNonBlockingConcurrencyControlWithInflightInstant() throws Except
     checkWrittenData(result, 1);
   }
 
+  @ParameterizedTest
+  @EnumSource(value = WriteOperationType.class, names = {"BULK_INSERT", 
"INSERT", "UPSERT"})
+  public void testFileIdWithNonBlockingConcurrencyControl(WriteOperationType 
operationType) throws Exception {
+    HoodieWriteConfig config = createHoodieWriteConfig();
+    metaClient = getHoodieMetaClient(HoodieTableType.MERGE_ON_READ, 
config.getProps());
+
+    SparkRDDWriteClient client = getHoodieWriteClient(config);
+    List<String> dataset = Collections.singletonList("id0,Danny,0,0,par1");
+    String insertTime0 = client.createNewInstantTime();
+    List<WriteStatus> writeStatuses = writeData(client, insertTime0, dataset, 
true, operationType);
+    for (WriteStatus status : writeStatuses) {
+      String fileID = status.getFileId();
+      assertTrue(fileID.endsWith(CONSTANT_FILE_ID_SUFFIX + "-0"));

Review Comment:
   INSERT, UPSERT, BULK_INSERT all have suffixes without NBCC. If NBCC is 
enabled, INSERT, UPSERT does not have suffixes.
   
   If NBCC is enabled, INSERT and then BULK_INSERT, the fileId of INSERT has no 
suffix 
(00000000-0000-0000-0000-000000000000-0_20241008155540193.log.1_0-50-74), and 
the fileId of BULK_INSERT becomes 
.00000000-0000-0000-0000-0_20241008155118321.log.1_0-97-142.
   
   The reason is we use fileId prefix to do append. The prefix comes from
   
https://github.com/apache/hudi/blob/ece8d7c1b2740851daea9a1b9c98fa781d51e4f3/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java#L303-L309
   
   
https://github.com/apache/hudi/blob/ece8d7c1b2740851daea9a1b9c98fa781d51e4f3/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java#L85
   
   To be consistent with other behaviors, I think insert/upsert + nbcc should 
also have a suffix, although this suffix is meaningless.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to