jorisvandenbossche commented on issue #43614: URL: https://github.com/apache/arrow/issues/43614#issuecomment-2275767309
> I wasn't aware `FixedShapeTensorArray` could be constructed with strings .. > When `FixedShapeTensorArray` was built we didn't expect it'd be used for strings. In theory nothing in the spec says that you can only use it for numerical data types (although that is of course the typical use case): https://github.com/apache/arrow/blob/3420c0db2fe49d81bf3caf673e4e1302153a2c49/docs/source/format/CanonicalExtensions.rst?plain=1#L87-L89 And given you can construct an extension array from the storage, you can indeed easily construct a FixedShapeTensorArray with any Arrow type: ```python >>> storage_arr = pa.array([["a", "b"], ["c", "d"], ["e", "f"]], pa.list_(pa.string(), 2)) >>> arr = pa.ExtensionArray.from_storage(pa.fixed_shape_tensor(pa.string(), (2, )), storage_arr) >>> arr <pyarrow.lib.FixedShapeTensorArray object at 0x7f4d12e41da0> [ [ "a", "b" ], [ "c", "d" ], [ "e", "f" ] ] ``` Of course, just because this is supported in general, doesn't necessarily require that we support all data types in `to_numpy_ndarray()`. But so this worked in 15.0, and no longer works now: ``` >>> arr.to_numpy_ndarray() array([['a', 'b'], ['c', 'd'], ['e', 'f']], dtype=object) ``` From a user point of view, I don't directly see a reason to not support this conversion. The practical reason it no longer works is because of an implementation change in https://github.com/apache/arrow/pull/37533 where we moved this conversion to a numpy array to the C++ level through first converting to a Tensor (i.e. the `to_numpy_ndarray()` method now just calls `self.to_tensor().to_numpy()`), and we do only support numerical data types for a Tensor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
