danny0405 commented on code in PR #14261:
URL: https://github.com/apache/hudi/pull/14261#discussion_r2552999363
##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ArchivedTimelineLoaderV2.java:
##########
@@ -56,29 +59,58 @@ public void loadInstants(HoodieTableMetaClient metaClient,
// List all files
List<String> fileNames = LSMTimeline.latestSnapshotManifest(metaClient,
metaClient.getArchivePath()).getFileNames();
+ // Check if consumer supports early termination
+ StoppableRecordConsumer stoppable = recordConsumer instanceof
StoppableRecordConsumer
+ ? (StoppableRecordConsumer) recordConsumer
+ : null;
+
+ // Filter files by time range
+ List<String> filteredFiles = new ArrayList<>();
+ for (String fileName : fileNames) {
+ if (filter == null || LSMTimeline.isFileInRange(filter, fileName)) {
+ filteredFiles.add(fileName);
+ }
+ }
+
+ // Sort files in reverse chronological order if needed (newest first for
limit queries)
+ if (stoppable != null && stoppable.needsReverseOrder()) {
Review Comment:
If we do not have good way to plugin the limit logic simply and clean, maybe
we just add a separate method in `ArchivedTimelineLoader.loadInstants` with an
explicit param `StoppableRecordConsumer`, the benefits:
1. get rid of the null check and instance of check;
2. always sort the files in reverse chronological order;
3. read the files in single thread instead of in parallel.
Read with limit is somehow a range query instead of full scan, by doting
this, we can freely plugin in the logic required for limit while still keep the
basic scan query efficient and clean.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]