alamb commented on code in PR #11060:
URL: https://github.com/apache/datafusion/pull/11060#discussion_r1650084547
##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -41,12 +42,27 @@ use crate::error::Result;
use crate::execution::context::SessionState;
use crate::physical_plan::{ExecutionPlan, Statistics};
-use datafusion_common::not_impl_err;
+use datafusion_common::file_options::file_type::FileType;
+use datafusion_common::{internal_err, not_impl_err, GetExt};
use datafusion_physical_expr::{PhysicalExpr, PhysicalSortRequirement};
use async_trait::async_trait;
+use file_compression_type::FileCompressionType;
use object_store::{ObjectMeta, ObjectStore};
+/// A factory struct which produces [FileFormat] structs based on session and
command level options
Review Comment:
```suggestion
/// Factory for creating [`FileFormat`] instances based on session and
command level options
///
/// Users can provide their own `FileFormatFactory` to support arbitrary
file formats
```
##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -58,6 +74,15 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
/// downcast to a specific implementation.
fn as_any(&self) -> &dyn Any;
+ /// Returns the extension for this FileFormat, e.g. "file.csv" -> csv
+ fn get_ext(&self) -> String;
+
+ /// Returns the extension for this FileFormat when compressed, e.g.
"file.csv.gz" -> csv
+ fn get_ext_with_compression(
+ &self,
+ _file_compression_type: &FileCompressionType,
+ ) -> Result<String>;
Review Comment:
Same question about owned String
##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -58,6 +74,15 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
/// downcast to a specific implementation.
fn as_any(&self) -> &dyn Any;
+ /// Returns the extension for this FileFormat, e.g. "file.csv" -> csv
+ fn get_ext(&self) -> String;
Review Comment:
Minor: Is there any reason this has to return an owned `String`? Could it
return `&str` ?
##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -106,6 +131,67 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
}
}
+/// A container of [FileFormatFactory] which also implements [FileType].
+/// This enables converting a dyn FileFormat to a dyn FileType.
+/// The former trait is a superset of the latter trait, which includes
execution time
+/// relevant methods. [FileType] is only used in logical planning and only
implements
+/// the subset of methods required during logical planning.
+pub struct DefaultFileFormat {
Review Comment:
I found the naming confusing -- I would have expected that
`DefaultFileFormat` implemented `FileFormatFactory` or something
Maybe we could call this `FileFormatSource` ?
##########
datafusion/core/src/datasource/file_format/mod.rs:
##########
@@ -106,6 +131,67 @@ pub trait FileFormat: Send + Sync + fmt::Debug {
}
}
+/// A container of [FileFormatFactory] which also implements [FileType].
+/// This enables converting a dyn FileFormat to a dyn FileType.
+/// The former trait is a superset of the latter trait, which includes
execution time
+/// relevant methods. [FileType] is only used in logical planning and only
implements
+/// the subset of methods required during logical planning.
+pub struct DefaultFileFormat {
+ file_format_factory: Arc<dyn FileFormatFactory>,
+}
+
+impl DefaultFileFormat {
+ /// Constructs a [DefaultFileFormat] wrapper from a [FileFormatFactory]
+ pub fn new(file_format_factory: Arc<dyn FileFormatFactory>) -> Self {
+ Self {
+ file_format_factory,
+ }
+ }
+}
+
+impl FileType for DefaultFileFormat {
Review Comment:
Rather than this wrapper, I wonder if we could implement `FileType` for `dyn
FileFormatFactory`
```rust
impl FileType for &dyn FileFormatFactory{
...
}
```
And avoid having to have this struct 🤔
##########
datafusion/core/src/execution/session_state.rs:
##########
@@ -237,6 +248,33 @@ impl SessionState {
function_factory: None,
};
+ #[cfg(feature = "parquet")]
Review Comment:
👍
##########
datafusion/core/src/execution/session_state.rs:
##########
@@ -237,6 +248,33 @@ impl SessionState {
function_factory: None,
};
+ #[cfg(feature = "parquet")]
+ match
new_self.register_file_format(Arc::new(ParquetFormatFactory::new()), false)
+ {
+ Ok(_) => (),
+ Err(e) => log::info!("Unable to register default ParquetFormat:
{e}"),
+ };
Review Comment:
Another way to write this would be
```suggestion
if let Err(e) =
new_self.register_file_format(Arc::new(ParquetFormatFactory::new()), false)
{
log::info!("Unable to register default ParquetFormat: {e}"),
};
```
which might be more concise 🤔
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -786,32 +779,9 @@ impl DefaultPhysicalPlanner {
table_partition_cols,
overwrite: false,
};
- let mut table_options = session_state.default_table_options();
Review Comment:
this is great this code has been removed from the default planner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]