progval opened a new pull request, #12593: URL: https://github.com/apache/datafusion/pull/12593
## Which issue does this PR close? Closes #12592. ## Rationale for this change This allows users to, for example, cache the Page Index so it does not need to be parsed every time we open the file. If have a demo here: https://gitlab.softwareheritage.org/swh/devel/swh-provenance/-/merge_requests/182 , the key thing being a `CachingParquetFormatFactory`/`CachingParquetFormat` pair that acts like `ParquetFormatFactory`/`ParquetFormat` but they call `ParquetExecBuilder::with_parquet_file_reader_factory` to a file reader factory that keeps a pool of readers (keyed by file path) ## What changes are included in this PR? * Renamed `ParquetFileReader` struct to `DefaultParquetFileReader` * Add new `ParquetFileReader` trait that extends `AsyncFileReader` with a `load_metadata` method. * Call it from `<ParquetOpener as FileOpener>::open` ## Are these changes tested? Not within the repo. Should I add a new module `datafusion-examples/` adapted from my demo above. ## Are there any user-facing changes? Breaking change for any user who implements `ParquetFileReaderFactory`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org