scovich commented on PR #9234: URL: https://github.com/apache/arrow-rs/pull/9234#issuecomment-3811891539
Skimming down the list of methods: * Many code sites -- both internally and in user code -- use `Array::data_type` to drive casting decisions. Custom implementations of the `Array` trait risk causing panics in such code, unless `Array::as_any` actually returns the corresponding concrete type. See e.g. https://docs.rs/arrow/latest/arrow/array/macro.downcast_primitive_array.html * Corollary: Any type that attempts to `impl Array` as a "container" is utterly unusable as an actual array, because the original type could only be recovered if its `Array::as_any` returns a reference to the original type. * `Array::into_data` and `Array::to_data` must produce a valid `ArrayData` (with a valid `ArrayData::data_type`), or risk causing panics and/or UB in downstream consumers such as `make_array`. * Corollary: `make_array` can never recover the original custom array type. It will instead recover whatever `ArrayData::data_type` indicated. * `Array::is_empty`, `Array::len`, and `Array::offset` must be accurate, even if this array is the result of `Array::slice` * `Array::nulls` must be accurate (any entry wrongly marked non-null has an undefined value) Summing it all up -- any custom Array implementation must either be a complex analogue to `dyn Any` -- completely ignoring the normal Array API -- or must look and act _exactly_ like a newtype wrapper around one of the built-in Array types (e.g. `Arc<dyn Array>` can safely `impl Array`). - There seems to be a connection here with https://github.com/apache/arrow-rs/issues/8794, where `Array::as_any` as a replacement for `Array: Any` causes awkwardness in casting? Problem is -- there's no way to tell which one you're dealing with, for an arbitrary `&dyn Array`. The only way to be sure you have a usable `Array` is to round trip it to `ArrayData` and back before attempting to work with it (assuming `Array::into_data` is implemented correctly). If we really wanted to support the use case of `dyn Array` as a stand-in for `dyn Any` (since `Array: !Any`), then we'd have to define a new method, a "raw" version of `Array::as_any`, that can be downcast to recover the true type, and possibly also an analogue to `Array::data_type` that could guide the use of that raw casting. That way, the custom array-as-pointer could maintain the 1:1 correspondence between `Array::data_type` and `Array::as_any`, while still allowing to recover the original type. However, it seems like these are really just different use cases and we probably shouldn't try to conflate them in a single type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
