tobixdev commented on code in PR #20312:
URL: https://github.com/apache/datafusion/pull/20312#discussion_r3010143732


##########
datafusion/expr/src/registry.rs:
##########
@@ -215,3 +218,280 @@ impl FunctionRegistry for MemoryFunctionRegistry {
         self.udwfs.keys().cloned().collect()
     }
 }
+
+/// A cheaply cloneable pointer to an [ExtensionTypeRegistration].
+pub type ExtensionTypeRegistrationRef = Arc<dyn ExtensionTypeRegistration>;
+
+/// The registration of an extension type. Implementations of this trait are 
responsible for
+/// *creating* instances of [`DFExtensionType`] that represent the entire 
semantics of an extension
+/// type.
+///
+/// # Why do we need a Registration?
+///
+/// A good question is why this trait is even necessary. Why not directly 
register the
+/// [`DFExtensionType`] in a registration?
+///
+/// While this works for extension types requiring no additional metadata 
(e.g., `arrow.uuid`), it
+/// does not work for more complex extension types with metadata. For example, 
consider an extension
+/// type `custom.shortened(n)` that aims to short the pretty-printing string 
to `n` characters.
+/// Here, `n` is a parameter of the extension type and should be a field in 
the struct that
+/// implements the [`DFExtensionType`]. The job of the registration is to read 
the metadata from the
+/// field and create the corresponding [`DFExtensionType`] instance with the 
correct `n` set.
+///
+/// The [`DefaultExtensionTypeRegistration`] provides a convenient way of 
creating registrations.
+pub trait ExtensionTypeRegistration: Debug + Send + Sync {
+    /// The name of the extension type.
+    ///
+    /// This name will be used to find the correct [ExtensionTypeRegistration] 
when an extension
+    /// type is encountered.
+    fn type_name(&self) -> &str;
+
+    /// Creates an extension type instance from the optional metadata. The 
name of the extension
+    /// type is not a parameter as it's already defined by the registration 
itself.
+    fn create_df_extension_type(
+        &self,
+        metadata: Option<&str>,
+    ) -> Result<DFExtensionTypeRef>;
+}

Review Comment:
   @paleolimbot I am currently trying to [implement `DFExtensionType` for the 
remaining canonical extension 
types](https://github.com/apache/datafusion/issues/21144) and apparently some 
canonical extension types don't seem to store their storage type. 
   
   For example, `arrow_schema::extension::Json` is only defines like this:
   
   ```rust
   pub struct Json(JsonMetadata);
   
   #[derive(Debug, Default, Clone, PartialEq)]
   pub struct JsonMetadata(Option<Empty>);
   ```
   
   Now, JSON allows `String` or `LargeString` or `StringView` as storage type 
and because `Json` does not store this storage type I cannot implement our 
trait. I could create a wrapping `DFJson` (or similar) struct that simply 
stores the storage type alongside the metadata but then we would no longer use 
the same structs. 
   
   There is also a similar issues for `arrow.opaque` and 
`arrow.parquet.variant` (we must know which fields the struct has).
   
   Do you have any more thoughts on that?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to