alamb opened a new issue, #18223: URL: https://github.com/apache/datafusion/issues/18223
### Is your feature request related to a problem or challenge? This is is part of implementing LogicalTypes / Extension Types in DataFusion, as described by @findepi - https://github.com/apache/datafusion/issues/12644 [ExtensionType](https://docs.rs/arrow-schema/56.2.0/arrow_schema/extension/trait.ExtensionType.html)s are defined using the metadata on an arrow `Field` (not the DataType) and stored physically as one of the existing arrow types. This system has the nice benefit that extension types can be processed (passed through) by systems that don't support them as their underlying arrow type, and then additional semantics added by systems that do. As people continue using DataFusion to implement more sophisticated extension types such as geometry and geography (@paleolimbot) and Variant @friendlymatthew ), they are finding is important to customize certain operations that are currently hardcoded in DataFusion based on physical type. Some example of operations where special semantics are sometimes needed for extension types 1. printing / displaying values (e.g. printing Variant values in a JSON like manner rather than their raw bytes) 2. casting values to/from other types 3. Comparing values (e.g. it is not correct to compare two variant values byte by byte) There are a few challenges challenges now: 1. Extension type information is carried on [`Field`](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html) (rather than DataType), and the [Field](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html) is not yet available everywhere (though @paleolimbot and others are working on this) 2. Even once we have `Field` available everywhere, the callsites for many cast/print and binary operations call directly into the arrow kernels which have no way to customize behavior for extension types. ### Describe the solution you'd like I think we need some sort of DataFusion API for users of extension types to specify and customize their behavior. ### Describe alternatives you've considered One possibility is to add a `DFExtensionType` trait, that extends the exiting [`ExtensionType`](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html#method.data_type) trait, similar to [`DFSchema`](https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html) Maybe something like ```rust /// DataFusion Extension Type support pub trait DFExtensionType: ExtensionTrait { /// Cast a column of this extension type to the target fn cast(&self, input: ColumarValue, output_type: &Field) -> Result<ColumnarValue>; // .. other functions ... } ``` We would also need some way to register these types dynamically with the SessionContext as well as pass along the registry into the places they are needed. ```rust let ctx = SessionContext::new(); ctx.register_extension(Arc::new(DFVariantExtension)); ... ``` I am not quite sure if this is the right API, we would need to try it out probably ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
