Re: [PR] feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines [hudi]

via GitHub Sat, 22 Nov 2025 02:59:25 -0800


danny0405 commented on code in PR #14261:
URL: https://github.com/apache/hudi/pull/14261#discussion_r2552999363



##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineLoaderV2.java:
##########
@@ -56,29 +59,58 @@ public void loadInstants(HoodieTableMetaClient metaClient,
       // List all files
       List<String> fileNames = LSMTimeline.latestSnapshotManifest(metaClient, 
metaClient.getArchivePath()).getFileNames();
 
+      // Check if consumer supports early termination
+      StoppableRecordConsumer stoppable = recordConsumer instanceof 
StoppableRecordConsumer
+          ? (StoppableRecordConsumer) recordConsumer
+          : null;
+
+      // Filter files by time range
+      List<String> filteredFiles = new ArrayList<>();
+      for (String fileName : fileNames) {
+        if (filter == null || LSMTimeline.isFileInRange(filter, fileName)) {
+          filteredFiles.add(fileName);
+        }
+      }
+
+      // Sort files in reverse chronological order if needed (newest first for 
limit queries)
+      if (stoppable != null && stoppable.needsReverseOrder()) {

Review Comment:
   If we do not have good way to plugin the limit logic simply and clean, maybe 
we just add a separate method in `ArchivedTimelineLoader.loadInstants` with an 
explicit param `StoppableRecordConsumer`, the benefits:
   
   1. get rid of the null check and instance of check;
   2. always sort the files in reverse chronological order;
   3. read the files in single thread instead of in parallel.
   
   Read with limit is somehow a range query instead of full scan, by doting 
this, we can freely plugin in the logic required for limit while still keep the 
basic scan query efficient and clean.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines [hudi]

Reply via email to