tobixdev commented on code in PR #20312:
URL: https://github.com/apache/datafusion/pull/20312#discussion_r2994791692
##########
datafusion/expr/src/registry.rs:
##########
@@ -215,3 +218,280 @@ impl FunctionRegistry for MemoryFunctionRegistry {
self.udwfs.keys().cloned().collect()
}
}
+
+/// A cheaply cloneable pointer to an [ExtensionTypeRegistration].
+pub type ExtensionTypeRegistrationRef = Arc<dyn ExtensionTypeRegistration>;
+
+/// The registration of an extension type. Implementations of this trait are
responsible for
+/// *creating* instances of [`DFExtensionType`] that represent the entire
semantics of an extension
+/// type.
+///
+/// # Why do we need a Registration?
+///
+/// A good question is why this trait is even necessary. Why not directly
register the
+/// [`DFExtensionType`] in a registration?
+///
+/// While this works for extension types requiring no additional metadata
(e.g., `arrow.uuid`), it
+/// does not work for more complex extension types with metadata. For example,
consider an extension
+/// type `custom.shortened(n)` that aims to short the pretty-printing string
to `n` characters.
+/// Here, `n` is a parameter of the extension type and should be a field in
the struct that
+/// implements the [`DFExtensionType`]. The job of the registration is to read
the metadata from the
+/// field and create the corresponding [`DFExtensionType`] instance with the
correct `n` set.
+///
+/// The [`DefaultExtensionTypeRegistration`] provides a convenient way of
creating registrations.
+pub trait ExtensionTypeRegistration: Debug + Send + Sync {
+ /// The name of the extension type.
+ ///
+ /// This name will be used to find the correct [ExtensionTypeRegistration]
when an extension
+ /// type is encountered.
+ fn type_name(&self) -> &str;
+
+ /// Creates an extension type instance from the optional metadata. The
name of the extension
+ /// type is not a parameter as it's already defined by the registration
itself.
+ fn create_df_extension_type(
+ &self,
+ metadata: Option<&str>,
+ ) -> Result<DFExtensionTypeRef>;
+}
Review Comment:
Ok yeah this seems to be the case. I didn't know!
For others:
```rust
pub struct FixedShapeTensor {
/// The data type of individual tensor elements.
value_type: DataType,
/// The metadata of this extension type.
metadata: FixedShapeTensorMetadata,
}
```
I agree that these two concepts should align as much as possible. I've used
the name `serialize_metadata(&self) -> Option<String>` without (d) so that the
name is shared.
##########
datafusion/expr/src/registry.rs:
##########
@@ -215,3 +218,280 @@ impl FunctionRegistry for MemoryFunctionRegistry {
self.udwfs.keys().cloned().collect()
}
}
+
+/// A cheaply cloneable pointer to an [ExtensionTypeRegistration].
+pub type ExtensionTypeRegistrationRef = Arc<dyn ExtensionTypeRegistration>;
+
+/// The registration of an extension type. Implementations of this trait are
responsible for
+/// *creating* instances of [`DFExtensionType`] that represent the entire
semantics of an extension
+/// type.
+///
+/// # Why do we need a Registration?
+///
+/// A good question is why this trait is even necessary. Why not directly
register the
+/// [`DFExtensionType`] in a registration?
+///
+/// While this works for extension types requiring no additional metadata
(e.g., `arrow.uuid`), it
+/// does not work for more complex extension types with metadata. For example,
consider an extension
+/// type `custom.shortened(n)` that aims to short the pretty-printing string
to `n` characters.
+/// Here, `n` is a parameter of the extension type and should be a field in
the struct that
+/// implements the [`DFExtensionType`]. The job of the registration is to read
the metadata from the
+/// field and create the corresponding [`DFExtensionType`] instance with the
correct `n` set.
+///
+/// The [`DefaultExtensionTypeRegistration`] provides a convenient way of
creating registrations.
+pub trait ExtensionTypeRegistration: Debug + Send + Sync {
+ /// The name of the extension type.
+ ///
+ /// This name will be used to find the correct [ExtensionTypeRegistration]
when an extension
+ /// type is encountered.
+ fn type_name(&self) -> &str;
+
+ /// Creates an extension type instance from the optional metadata. The
name of the extension
+ /// type is not a parameter as it's already defined by the registration
itself.
+ fn create_df_extension_type(
+ &self,
+ metadata: Option<&str>,
+ ) -> Result<DFExtensionTypeRef>;
+}
Review Comment:
Oh yeah this seems to be the case. I didn't know! Thanks!
For others:
```rust
pub struct FixedShapeTensor {
/// The data type of individual tensor elements.
value_type: DataType,
/// The metadata of this extension type.
metadata: FixedShapeTensorMetadata,
}
```
I agree that these two concepts should align as much as possible. I've used
the name `serialize_metadata(&self) -> Option<String>` without (d) so that the
name is shared.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]