alamb commented on code in PR #11060:
URL: https://github.com/apache/datafusion/pull/11060#discussion_r1650084547


##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -41,12 +42,27 @@ use crate::error::Result;
 use crate::execution::context::SessionState;
 use crate::physical_plan::{ExecutionPlan, Statistics};
 
-use datafusion_common::not_impl_err;
+use datafusion_common::file_options::file_type::FileType;
+use datafusion_common::{internal_err, not_impl_err, GetExt};
 use datafusion_physical_expr::{PhysicalExpr, PhysicalSortRequirement};
 
 use async_trait::async_trait;
+use file_compression_type::FileCompressionType;
 use object_store::{ObjectMeta, ObjectStore};
 
+/// A factory struct which produces [FileFormat] structs based on session and 
command level options

Review Comment:
   
   
   ```suggestion
   /// Factory for creating [`FileFormat`] instances based on session and 
command level options
   ///
   /// Users can provide their own `FileFormatFactory` to support arbitrary 
file formats
   ```



##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -58,6 +74,15 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
     /// downcast to a specific implementation.
     fn as_any(&self) -> &dyn Any;
 
+    /// Returns the extension for this FileFormat, e.g. "file.csv" -> csv
+    fn get_ext(&self) -> String;
+
+    /// Returns the extension for this FileFormat when compressed, e.g. 
"file.csv.gz" -> csv
+    fn get_ext_with_compression(
+        &self,
+        _file_compression_type: &FileCompressionType,
+    ) -> Result<String>;

Review Comment:
   Same question about owned String



##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -58,6 +74,15 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
     /// downcast to a specific implementation.
     fn as_any(&self) -> &dyn Any;
 
+    /// Returns the extension for this FileFormat, e.g. "file.csv" -> csv
+    fn get_ext(&self) -> String;

Review Comment:
   Minor: Is there any reason this has to return an owned `String`? Could it 
return `&str` ? 



##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -106,6 +131,67 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
     }
 }
 
+/// A container of [FileFormatFactory] which also implements [FileType].
+/// This enables converting a dyn FileFormat to a dyn FileType.
+/// The former trait is a superset of the latter trait, which includes 
execution time
+/// relevant methods. [FileType] is only used in logical planning and only 
implements
+/// the subset of methods required during logical planning.
+pub struct DefaultFileFormat {

Review Comment:
   I found the naming confusing -- I would have expected that 
`DefaultFileFormat` implemented `FileFormatFactory` or something
   
   Maybe we could call this `FileFormatSource` ?
   



##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -106,6 +131,67 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
     }
 }
 
+/// A container of [FileFormatFactory] which also implements [FileType].
+/// This enables converting a dyn FileFormat to a dyn FileType.
+/// The former trait is a superset of the latter trait, which includes 
execution time
+/// relevant methods. [FileType] is only used in logical planning and only 
implements
+/// the subset of methods required during logical planning.
+pub struct DefaultFileFormat {
+    file_format_factory: Arc<dyn FileFormatFactory>,
+}
+
+impl DefaultFileFormat {
+    /// Constructs a [DefaultFileFormat] wrapper from a [FileFormatFactory]
+    pub fn new(file_format_factory: Arc<dyn FileFormatFactory>) -> Self {
+        Self {
+            file_format_factory,
+        }
+    }
+}
+
+impl FileType for DefaultFileFormat {

Review Comment:
   Rather than this wrapper, I wonder if we could implement `FileType` for `dyn 
FileFormatFactory`
   
   ```rust
   impl FileType for &dyn FileFormatFactory{ 
   ...
   }
   ```
   
   And avoid having to have this struct 🤔 



##########
datafusion/core/src/execution/session_state.rs:
##########
@@ -237,6 +248,33 @@ impl SessionState {
             function_factory: None,
         };
 
+        #[cfg(feature = "parquet")]

Review Comment:
   👍 



##########
datafusion/core/src/execution/session_state.rs:
##########
@@ -237,6 +248,33 @@ impl SessionState {
             function_factory: None,
         };
 
+        #[cfg(feature = "parquet")]
+        match 
new_self.register_file_format(Arc::new(ParquetFormatFactory::new()), false)
+        {
+            Ok(_) => (),
+            Err(e) => log::info!("Unable to register default ParquetFormat: 
{e}"),
+        };

Review Comment:
   Another way to write this would be
   
   ```suggestion
           if let Err(e) = 
new_self.register_file_format(Arc::new(ParquetFormatFactory::new()), false)
           {
              log::info!("Unable to register default ParquetFormat: {e}"),
           };
   ```
   
   which might be more concise 🤔 



##########
datafusion/core/src/physical_planner.rs:
##########
@@ -786,32 +779,9 @@ impl DefaultPhysicalPlanner {
                     table_partition_cols,
                     overwrite: false,
                 };
-                let mut table_options = session_state.default_table_options();

Review Comment:
   this is great this code has been removed from the default planner



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to