paleolimbot commented on issue #8730:
URL: https://github.com/apache/arrow-rs/issues/8730#issuecomment-3769070052

   > What issues are we running into with using Field
   
   There are no issues that haven't been brought up in this thread...the 
existing use of the `Field`, `DataType` and the existing APIs that use them (in 
the arrow crate and otherwise) need to make awkward accommodations to implement 
extension types. Notably, this is passing a reference to a registry through 
every DataType/Field/Schema/RecordBatch/ArrayRef operation or inventing a new 
DataType/Field/Schema/RecordBatch/ArrayRef stack that supports `Extension(dyn 
Any)`.
   
   We are now up to 6 canonical extension types with at least one more being 
discussed on the mailing list...I think it is reasonable that people (not just 
DataFusion) want to cast, print, serialize to/from JSON, read and write CSVs, 
etc. without having to rewrite their APIs. Extension types are no longer 
metadata that can just be dropped when convenient for many users of arrow-rs.
   
   > would the extensions suggestion work?
   
   I don't think the argument is that it is not possible to do this purely 
based on serialized metadata or embedding something in the field as described 
here, it's that the people willing to put in the work to implement extension 
types aren't interested in doing so (purely based on how long some of these 
tickets have been open).
   
   My specific objection to embedding an extension type instance on a `Field` 
is that one would *still* have to go through every downstream codebase and 
rewrite every DataType/Field/Schema/RecordBatch/ArrayRef operation to ensure 
the type was propagated. At the point we're doing that we may as well invent 
our own (e.g.) LogicalType/DFField/DFSchema stack. Because no such stack exists 
in arrow-rs, that has to be rewritten for every downstream code base (e.g., 
I've seen some version of `(ArrayRef, FieldRef)` combination defined to make 
writing array-level APIs less painful at least three times).
   
   > What if we just said "F-it" and added a DataType::Extension(dyn Any) to 
the arrow crate? 
   
   I'm happy to put the work in to make that happen (implementation or review) 
whenever it feels like that work has a chance of getting merged. This can start 
behind a feature flag and nobody that doesn't want to use this mechanism has to.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to