tobixdev commented on code in PR #20312:
URL: https://github.com/apache/datafusion/pull/20312#discussion_r3010143732
##########
datafusion/expr/src/registry.rs:
##########
@@ -215,3 +218,280 @@ impl FunctionRegistry for MemoryFunctionRegistry {
self.udwfs.keys().cloned().collect()
}
}
+
+/// A cheaply cloneable pointer to an [ExtensionTypeRegistration].
+pub type ExtensionTypeRegistrationRef = Arc<dyn ExtensionTypeRegistration>;
+
+/// The registration of an extension type. Implementations of this trait are
responsible for
+/// *creating* instances of [`DFExtensionType`] that represent the entire
semantics of an extension
+/// type.
+///
+/// # Why do we need a Registration?
+///
+/// A good question is why this trait is even necessary. Why not directly
register the
+/// [`DFExtensionType`] in a registration?
+///
+/// While this works for extension types requiring no additional metadata
(e.g., `arrow.uuid`), it
+/// does not work for more complex extension types with metadata. For example,
consider an extension
+/// type `custom.shortened(n)` that aims to short the pretty-printing string
to `n` characters.
+/// Here, `n` is a parameter of the extension type and should be a field in
the struct that
+/// implements the [`DFExtensionType`]. The job of the registration is to read
the metadata from the
+/// field and create the corresponding [`DFExtensionType`] instance with the
correct `n` set.
+///
+/// The [`DefaultExtensionTypeRegistration`] provides a convenient way of
creating registrations.
+pub trait ExtensionTypeRegistration: Debug + Send + Sync {
+ /// The name of the extension type.
+ ///
+ /// This name will be used to find the correct [ExtensionTypeRegistration]
when an extension
+ /// type is encountered.
+ fn type_name(&self) -> &str;
+
+ /// Creates an extension type instance from the optional metadata. The
name of the extension
+ /// type is not a parameter as it's already defined by the registration
itself.
+ fn create_df_extension_type(
+ &self,
+ metadata: Option<&str>,
+ ) -> Result<DFExtensionTypeRef>;
+}
Review Comment:
@paleolimbot I am currently trying to [implement `DFExtensionType` for the
remaining canonical extension
types](https://github.com/apache/datafusion/issues/21144) and apparently some
canonical extension types don't seem to store their storage type.
For example, `arrow_schema::extension::Json` is only defines like this:
```rust
pub struct Json(JsonMetadata);
#[derive(Debug, Default, Clone, PartialEq)]
pub struct JsonMetadata(Option<Empty>);
```
Now, JSON allows `String` or `LargeString` or `StringView` as storage type
and because `Json` does not store this storage type I cannot implement our
trait. I could create a wrapping `DFJson` (or similar) struct that simply
stores the storage type alongside the metadata but then we would no longer use
the same structs.
There is also a similar issues for `arrow.opaque` and
`arrow.parquet.variant` (we must know which fields the struct has).
Do you have any more thoughts on that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]