paleolimbot commented on issue #8730: URL: https://github.com/apache/arrow-rs/issues/8730#issuecomment-3769070052
> What issues are we running into with using Field There are no issues that haven't been brought up in this thread...the existing use of the `Field`, `DataType` and the existing APIs that use them (in the arrow crate and otherwise) need to make awkward accommodations to implement extension types. Notably, this is passing a reference to a registry through every DataType/Field/Schema/RecordBatch/ArrayRef operation or inventing a new DataType/Field/Schema/RecordBatch/ArrayRef stack that supports `Extension(dyn Any)`. We are now up to 6 canonical extension types with at least one more being discussed on the mailing list...I think it is reasonable that people (not just DataFusion) want to cast, print, serialize to/from JSON, read and write CSVs, etc. without having to rewrite their APIs. Extension types are no longer metadata that can just be dropped when convenient for many users of arrow-rs. > would the extensions suggestion work? I don't think the argument is that it is not possible to do this purely based on serialized metadata or embedding something in the field as described here, it's that the people willing to put in the work to implement extension types aren't interested in doing so (purely based on how long some of these tickets have been open). My specific objection to embedding an extension type instance on a `Field` is that one would *still* have to go through every downstream codebase and rewrite every DataType/Field/Schema/RecordBatch/ArrayRef operation to ensure the type was propagated. At the point we're doing that we may as well invent our own (e.g.) LogicalType/DFField/DFSchema stack. Because no such stack exists in arrow-rs, that has to be rewritten for every downstream code base (e.g., I've seen some version of `(ArrayRef, FieldRef)` combination defined to make writing array-level APIs less painful at least three times). > What if we just said "F-it" and added a DataType::Extension(dyn Any) to the arrow crate? I'm happy to put the work in to make that happen (implementation or review) whenever it feels like that work has a chance of getting merged. This can start behind a feature flag and nobody that doesn't want to use this mechanism has to. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
