tobixdev commented on issue #18223: URL: https://github.com/apache/datafusion/issues/18223#issuecomment-3452344643
> One possibility is to add a DFExtensionType trait, that extends the exiting [ExtensionType](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html#method.data_type) trait, similar to [DFSchema](https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html) We have one problem here that `ExtensionType` is not dyn-compatible due to the use of an associated constant and an associated type. I believe we need dyn-compatibility as our registry will look something like `Arc<dyn ExtensionTypeThingey>`. I've experimented a bit more with this and ran into the problem that we have been discussing earlier with custom printing. If I want to define a custom string representation for a type, I need to have access to that in the printing logic. Currently, this is happening in the respective Debug/Display implementations and they do not have access to a registry. Therefore, from my perspective, there are only two sane options: 1. Use some kind of pretty-printing visitor that has access to the registry 2. Directly store the extension in the `Field` or a possible `DFField`. I think 2. would be the better approach but I may be mistaken. Here, a problem is that the "use `Field` instead of `DataType`" approach would likely not be enough if `Field` cannot provide access to a `dyn ExtensionTypeThingey`. This would require us to make another round of "replace `Field` with `PowerfulField`" which I think we want to avoid if possible. So, can we use `Field` as a carrier for our DataFusion extension type trait? Not in its current form but I've created an example of how it might could look: https://github.com/apache/arrow-rs/compare/main...tobixdev:arrow-rs:crazy-field-experiment?expand=1 Code that uses `arrow-rs` can then provide their own enriched extension type traits without managing this themselves. It has of course multiple drawbacks: - Downcasting can create problems that a `DFDataType` enum would prevent. - I got an error with unwind safety in the tests due to the dynamic dispatch (see diff). - Less ergonomic and complex as another trait exists Any thoughts on that @alamb @paleolimbot ? I think for Option 2. the other way would be to have a `DFField` and a `DFDataType`. I think this could also be fine if we use that in `DFSchema`. @paleolimbot Thanks for your input 👍 ! I am currently exploring Option 2 but if we choose to go with Option 1 (use registry / `TypeExtensions`) this could be an interessting approach! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
