alamb commented on code in PR #2572:
URL: https://github.com/apache/arrow-datafusion/pull/2572#discussion_r878686331


##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -63,8 +67,9 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
     /// TODO: should the file source return statistics for only columns 
referred to in the table schema?
     async fn infer_stats(
         &self,
-        reader: Arc<dyn ObjectReader>,
+        store: &dyn ObjectStore,

Review Comment:
   what is the reason for the change to `&dyn ObjectStore`? Couldn't this make 
it harder to pass the object store on to an inner implementation (that might 
need an `Arc<ObjectStore>`)?



##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -63,8 +67,9 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
     /// TODO: should the file source return statistics for only columns 
referred to in the table schema?
     async fn infer_stats(
         &self,
-        reader: Arc<dyn ObjectReader>,
+        store: &dyn ObjectStore,

Review Comment:
   ```suggestion
           store: &Arc<dyn ObjectStore>,
   ```



##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -52,7 +52,11 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
     /// be analysed up to a given number of records or files (as specified in 
the
     /// format config) then give the estimated common schema. This might fail 
if
     /// the files have schemas that cannot be merged.
-    async fn infer_schema(&self, readers: ObjectReaderStream) -> 
Result<SchemaRef>;
+    async fn infer_schema(
+        &self,
+        store: &dyn ObjectStore,

Review Comment:
   ```suggestion
           store: &Arc<dyn ObjectStore>,
   ```



##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -75,3 +80,51 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
         filters: &[Expr],
     ) -> Result<Arc<dyn ExecutionPlan>>;
 }
+
+#[cfg(test)]
+pub(crate) mod test_util {
+    use super::*;
+    use crate::datasource::listing::PartitionedFile;
+    use datafusion_data_access::object_store::local::{
+        local_unpartitioned_file, LocalFileSystem,
+    };
+
+    pub async fn get_exec_format(

Review Comment:
   ```suggestion
       pub async fn scan_from_format(
   ```
   
   I found this name somewhat confusing as it is producing an execution plan 
based on a file format



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to