progval opened a new pull request, #12593:
URL: https://github.com/apache/datafusion/pull/12593

   ## Which issue does this PR close?
   
   Closes #12592.
   
   ## Rationale for this change
   
   This allows users to, for example, cache the Page Index so it does not need 
to be parsed every time we open the file.
   
   If have a demo here: 
https://gitlab.softwareheritage.org/swh/devel/swh-provenance/-/merge_requests/182
 , the key thing being a `CachingParquetFormatFactory`/`CachingParquetFormat` 
pair that acts like `ParquetFormatFactory`/`ParquetFormat` but they call 
`ParquetExecBuilder::with_parquet_file_reader_factory` to a file reader factory 
that keeps a pool of readers (keyed by file path)
   
   ## What changes are included in this PR?
   
   * Renamed `ParquetFileReader` struct to `DefaultParquetFileReader`
   * Add new `ParquetFileReader` trait that extends `AsyncFileReader` with a 
`load_metadata` method.
   * Call it from `<ParquetOpener as FileOpener>::open`
   
   ## Are these changes tested?
   
   Not within the repo. Should I add a new module `datafusion-examples/` 
adapted from my demo above.
   
   ## Are there any user-facing changes?
   
   Breaking change for any user who implements `ParquetFileReaderFactory`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to