alamb commented on code in PR #17022: URL: https://github.com/apache/datafusion/pull/17022#discussion_r2267447842
########## datafusion/datasource-parquet/src/file_format.rs: ########## @@ -1033,24 +1038,62 @@ impl MetadataFetch for ObjectStoreFetch<'_> { /// through [`ParquetFileReaderFactory`]. /// /// [`ParquetFileReaderFactory`]: crate::ParquetFileReaderFactory -pub async fn fetch_parquet_metadata( - store: &dyn ObjectStore, - meta: &ObjectMeta, +pub async fn fetch_parquet_metadata<F: MetadataFetch>( Review Comment: I agree -- -- I am thinking the code that handles fetching metadata and schema is also getting enough options that it can probably be its own file / structure too. Maybe something like ```rust pub struct ParquetMetadataFetcher { ... } ``` So we could use it like ```rust let fetcher = ParquetMetadataFetcher::new(object_store, path) .with_hint(...); fetcher.fetch_metadata().await? ``` 🤔 ########## datafusion/datasource-parquet/src/file_format.rs: ########## @@ -312,6 +313,7 @@ async fn fetch_schema_with_location( file: &ObjectMeta, metadata_size_hint: Option<usize>, coerce_int96: Option<TimeUnit>, + file_metadata_cache: Option<Arc<dyn FileMetadataCache>>, Review Comment: I re-reviewed these changes and since I think we always now have a file_metadata_cache we could probably make this be non optional. However, there are many tests that need to change ########## datafusion/datasource-parquet/src/file_format.rs: ########## @@ -1033,24 +1038,62 @@ impl MetadataFetch for ObjectStoreFetch<'_> { /// through [`ParquetFileReaderFactory`]. /// /// [`ParquetFileReaderFactory`]: crate::ParquetFileReaderFactory -pub async fn fetch_parquet_metadata( - store: &dyn ObjectStore, - meta: &ObjectMeta, +pub async fn fetch_parquet_metadata<F: MetadataFetch>( Review Comment: The more I looked the more it seemed like what would be helpful would be a whole new module `statistics.rs` or something -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org