xuzifu666 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1535034946
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##########
@@ -61,6 +64,20 @@ public Map<Integer, HoodieRecordLocation>
loadBucketIdToFileIdMappingForPartitio
if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
bucketIdToFileIdMapping.put(bucketId, new
HoodieRecordLocation(commitTime, fileId));
} else {
+ // If hoodie.write.bucketid.multiple.delete.partition enable,
delete the partition to confirm next write is success
+ if (config.getWhetherDeletePartitonWhenBucketIdMultiple()) {
+ Path partitionPath =
FSUtils.getPartitionPath(hoodieTable.getMetaClient().getBasePathV2(),
partition);
+ try {
+ hoodieTable.getMetaClient().getFs().delete(partitionPath,
true);
+ } catch (IOException e) {
+ throw new HoodieIOException("Find multiple files at partition
path="
+ + partition + " belongs to the same bucket id = " +
bucketId, e);
+ }
+ throw new HoodieIOException("Find multiple files at partition
path="
+ + partition + " belongs to the same bucket id = " +
bucketId
+ + ", and had deleted the partition path, you can try the
job again");
+ }
+
Review Comment:
Yes,it is accidental condition,also would occur in enable spark speculation
or multiple writers .From actually test:
1. If code execute to here,it may handle this condition for bucketid
multiple.
2. In this condition,the partition data is dirty and not usable for user.
3. If try again to write data to the parititon would report the error all
the time.
4. Delete the partition to confirm try job success may more fitable and
autoable which not bring risk
5. User can decide whether to enable the function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]