paleolimbot commented on issue #18223: URL: https://github.com/apache/datafusion/issues/18223#issuecomment-3439065685
Thank you for writing this up, and thank you @tobixdev for https://github.com/apache/datafusion/pull/15106 and the reviews along the way (that PR I think hits the mark in a number of ways!) I wonder if a low impact way and reasonably satisfying way to start would be: ```rust // Maybe the lifetimes will be too annoying here, but the idea is that this can cheaply // represent any of the combinations of ArrayRef/FieldMetadata/FieldRef/ScalarValue/DataType // without cloning anything. This is basically nanoarrow's ArrowSchemaView :) pub struct SerializedTypeView<'a, 'b, 'c> { arrow_type: &'a DataType, extension_name: Option<&'b str>, extension_metadata: Option<&'c str>, } pub trait TypeExtensions { // None means just use the storage type implementation. Maybe Box<> or &'static could work here fn pretty_print_extension(&self, extension_name: &str) -> Option<Arc<dyn PrettyPrintExtension>> { None } // ...a future version could also return an Option<Arc<dyn CustomOrdering>> from #18124 // ...and in some magical future maybe we can just do fn logical_type(&self, type_view: &SerializedTypeView) -> Result<Arc<dyn ThePerfectLogicalTypeTrait>>; // ...where the LogicalType knows how to create LogicalArray/Scalars that can do this stuff on their own // with convenient APIs so people actually use them. } pub trait PrettyPrintExtension { // Probably there's a more established pretty print API. fn pretty_print_serialized(&self, type_view: &SerializedTypeView, storage: &ArrayRef, options_of_some_kind) -> Result<ArrayRef>; // ...a separate trait allows this API to evolve without affecting TypeExtensions } ``` It might be particularly satisfying to implement that for variant so that queries against the test files in the CLI show nice pretty JSON. It would be less satisfying, but possibly easier, to implement that for UUID. It also makes it so we don't have to come up with `ThePerfectLogicalTypeTrait` because that is hard and right now we represent type information in a lot of different ways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
