xicm commented on code in PR #12101:
URL: https://github.com/apache/hudi/pull/12101#discussion_r1802253444
##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestSparkNonBlockingConcurrencyControl.java:
##########
@@ -213,6 +215,23 @@ public void
testNonBlockingConcurrencyControlWithInflightInstant() throws Except
checkWrittenData(result, 1);
}
+ @ParameterizedTest
+ @EnumSource(value = WriteOperationType.class, names = {"BULK_INSERT",
"INSERT", "UPSERT"})
+ public void testFileIdWithNonBlockingConcurrencyControl(WriteOperationType
operationType) throws Exception {
+ HoodieWriteConfig config = createHoodieWriteConfig();
+ metaClient = getHoodieMetaClient(HoodieTableType.MERGE_ON_READ,
config.getProps());
+
+ SparkRDDWriteClient client = getHoodieWriteClient(config);
+ List<String> dataset = Collections.singletonList("id0,Danny,0,0,par1");
+ String insertTime0 = client.createNewInstantTime();
+ List<WriteStatus> writeStatuses = writeData(client, insertTime0, dataset,
true, operationType);
+ for (WriteStatus status : writeStatuses) {
+ String fileID = status.getFileId();
+ assertTrue(fileID.endsWith(CONSTANT_FILE_ID_SUFFIX + "-0"));
Review Comment:
INSERT, UPSERT, BULK_INSERT all have suffixes without NBCC. If NBCC is
enabled, INSERT, UPSERT does not have suffixes.
If NBCC is enabled, INSERT and then BULK_INSERT, the fileId of INSERT has no
suffix
(00000000-0000-0000-0000-000000000000-0_20241008155540193.log.1_0-50-74), and
the fileId of BULK_INSERT becomes
.00000000-0000-0000-0000-0_20241008155118321.log.1_0-97-142.
The reason is we use fileId prefix to do append. The prefix comes from
https://github.com/apache/hudi/blob/ece8d7c1b2740851daea9a1b9c98fa781d51e4f3/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java#L303-L309
https://github.com/apache/hudi/blob/ece8d7c1b2740851daea9a1b9c98fa781d51e4f3/hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java#L85
To be consistent with other behaviors, I think insert/upsert + nbcc should
also have a suffix, although this suffix is meaningless.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]