paleolimbot commented on issue #8730: URL: https://github.com/apache/arrow-rs/issues/8730#issuecomment-3769906303
> I presume the proposal is to add DataType::Extension(Box<DataType>, Box<dyn Any>) A few ways to do this, but a DataType enum variant with something dynamic (e.g., https://github.com/apache/arrow-rs/pull/7398 ). > If so how do these arrays get created? There are a number of places that have implemented Arrow extension types with a registry (Polars Arrow, Arrow C++, Arrow Go, Arrow Java, DuckDB, nanoarrow R, nanoarrow Python) to draw on to answer this question. Briefly, any operation that produces a schema or a record batch from something that isn't an arrow-rs object is responsible for providing a registry to resolve an extension type. In practice that works quite well (arrow-rs is great at propagating its own DataType objects). The problem of metadata/type mismatch is not unique to a dynamic Extension data type and even exists today: any attempt to set metadata might obliterate type information that an application is relying on. In fact, a dynamic DataType extension is specifically designed to alleviate the most common version of that issue (which is dropping the extension type on every call to `.data_type()` or `.column()`). > If lack of engagement is the issue I don't think it is lack of people interested, it is that the two options mentioned above (rewrite the DataType/Field/Schema/Array/RecordBatch stack and related APIs) are sufficiently disruptive that nobody want nobody wants to review them, or sufficiently hacky that nobody wants to implement them (perhaps just speaking for myself on this last point). > but how downstreams choose to glue this together is not prescribed by arrow-rs. Providing an optional dynamic DataType extension is not perscribing a mechanism to glue together extension handling; it is providing a tool to propagate type implementations that applications can choose to opt in to without writing a parallel DataType/Field/Schema/Array/RecordBatch stack. This certainly could be integrated with arrow-rs native APIs but I don't think any of us are suggesting that as a part of this proposal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
