Re: [PR] [HUDI-7207] During archiving, complete instants are deleted serially to prevent data errors during data reads. [hudi]

via GitHub Thu, 14 Dec 2023 03:31:10 -0800


danny0405 commented on code in PR #10325:
URL: https://github.com/apache/hudi/pull/10325#discussion_r1426597255



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java:
##########
@@ -342,11 +342,12 @@ private boolean deleteArchivedInstants(List<ActiveAction> 
activeActions, HoodieE
       );
     }
     if (!completedInstants.isEmpty()) {
-      context.foreach(
-          completedInstants,
-          instant -> activeTimeline.deleteInstantFileIfExists(instant),
-          Math.min(completedInstants.size(), 
config.getArchiveDeleteParallelism())
-      );
+      // Due to the concurrency between deleting completed instants and 
reading data,
+      // there may be hole in the timeline, which can lead to errors when 
reading data.
+      // Therefore, the concurrency of deleting completed instants is 
temporarily disabled,
+      // and instants are deleted in ascending order to prevent the occurrence 
of such holes.
+      completedInstants.stream()
+          .forEach(instant -> 
activeTimeline.deleteInstantFileIfExists(instant));
     }

Review Comment:
   Before release 1.0, Hudi relies on the existing file naming for file 
slicing. As long as there is parquet in the file group, the log would finally 
use the parquet instant time as its base instant time. The file slicing is not 
dependent to the commit metadata, the archiving sequence should not affect the 
file slicing version.
   
   Since release 1.0, we use completion time file slicing, and the removing 
sequence does not matter because the completiom time of log is deterministic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7207] During archiving, complete instants are deleted serially to prevent data errors during data reads. [hudi]

Reply via email to