alamb commented on code in PR #17022:
URL: https://github.com/apache/datafusion/pull/17022#discussion_r2267447842


##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -1033,24 +1038,62 @@ impl MetadataFetch for ObjectStoreFetch<'_> {
 /// through [`ParquetFileReaderFactory`].
 ///
 /// [`ParquetFileReaderFactory`]: crate::ParquetFileReaderFactory
-pub async fn fetch_parquet_metadata(
-    store: &dyn ObjectStore,
-    meta: &ObjectMeta,
+pub async fn fetch_parquet_metadata<F: MetadataFetch>(

Review Comment:
   I agree -- -- I am thinking the code that handles fetching metadata and 
schema is also getting enough options that it can probably be its own file / 
structure too. Maybe something like
   
   ```rust
   pub struct ParquetMetadataFetcher {
   ...
   }
   ```
   
   So we could use it like
   
   ```rust
   let fetcher = ParquetMetadataFetcher::new(object_store, path) 
     .with_hint(...);
   
   fetcher.fetch_metadata().await?
   ```
   
   🤔 



##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -312,6 +313,7 @@ async fn fetch_schema_with_location(
     file: &ObjectMeta,
     metadata_size_hint: Option<usize>,
     coerce_int96: Option<TimeUnit>,
+    file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,

Review Comment:
   I re-reviewed these changes and since I think we always now have a 
file_metadata_cache we could probably make this be non optional. However, there 
are many tests that need to change



##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -1033,24 +1038,62 @@ impl MetadataFetch for ObjectStoreFetch<'_> {
 /// through [`ParquetFileReaderFactory`].
 ///
 /// [`ParquetFileReaderFactory`]: crate::ParquetFileReaderFactory
-pub async fn fetch_parquet_metadata(
-    store: &dyn ObjectStore,
-    meta: &ObjectMeta,
+pub async fn fetch_parquet_metadata<F: MetadataFetch>(

Review Comment:
   The more I looked the more it seemed like what would be helpful would be a 
whole new module `statistics.rs` or something



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to