alamb commented on issue #9929: URL: https://github.com/apache/arrow-datafusion/issues/9929#issuecomment-2041409720
> In my mind `RowSelection` is a file-level struct and i think there could be multi-files in one parquetExec so `ParquetSelection` should be `multi-files-level` right? Yes, you are right. That is an excellent point > btw if user customized index is file level like page-index, i think directly use `RowSelection` is more easy way 🤔 I think one challenge with using `RowSelection` is that it is relative to the pages (or maybe the row group), rather than the overall file. FWIW what I hope to do over the next few weeks is to whip up a little POC showing how one might build a specialized index on top of paruqet files as a Demo and then figure out what types of APIs would be needed in DataFusion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
