codope commented on PR #11975: URL: https://github.com/apache/hudi/pull/11975#issuecomment-2365138352
> Can we just skip the file filtering if we find the file in the index item is a log file when rdo RO query, so that we can eliminate the new API on the file index. And a RO query on full compacted table can also utilitize the RLI index. @danny0405 This is a good point. But, the index item in RLI only contains fileId and not the file name. Moreover, I still think it makes sense to have this new API because think about time travel query. Let's say there is an instant `t` in the past which is a compaction instant and table was fully compacted at that instant. If user runs a query with record key predicate and `as of instant t` then RLI would still return candidate files as per the latest snapshot. And RO queries are nothing but time travel as of the latest compaction time. I think it's much cleaner to simply not use RLI for any type of queries other than Snapshot queries. Later, when we add support for time travel on metadata table, we can easily change the condition in the implementation of this API and use the index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
