majian1998 commented on code in PR #10325:
URL: https://github.com/apache/hudi/pull/10325#discussion_r1426358236


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java:
##########
@@ -342,11 +342,12 @@ private boolean deleteArchivedInstants(List<ActiveAction> 
activeActions, HoodieE
       );
     }
     if (!completedInstants.isEmpty()) {
-      context.foreach(
-          completedInstants,
-          instant -> activeTimeline.deleteInstantFileIfExists(instant),
-          Math.min(completedInstants.size(), 
config.getArchiveDeleteParallelism())
-      );
+      // Due to the concurrency between deleting completed instants and 
reading data,
+      // there may be hole in the timeline, which can lead to errors when 
reading data.
+      // Therefore, the concurrency of deleting completed instants is 
temporarily disabled,
+      // and instants are deleted in ascending order to prevent the occurrence 
of such holes.
+      completedInstants.stream()
+          .forEach(instant -> 
activeTimeline.deleteInstantFileIfExists(instant));
     }

Review Comment:
   Yes,  I encountered this issue in an older version of Hudi, but I believe 
that the current version of Hudi has not addressed this problem.
    As the example illustrates, let's assume that the order of ArchiveToInstant 
after sorting is 1, 2, 3, 4. Given a concurrency level of 2, instants 1 and 2 
will be processed on one thread, while instants 3 and 4 will be processed on 
another. Suppose that the deletion of instant 1 is slow, and instants 3 and 4 
are deleted first. This situation leads to the creation of an 'instant hole' in 
the timeline. If a query retrieves the timeline at this point, according to the 
rules for determining the visibility of the timeline, the files corresponding 
to instants 3 and 4 would be considered invisible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to