jorisvandenbossche commented on issue #43614:
URL: https://github.com/apache/arrow/issues/43614#issuecomment-2275767309

   > I wasn't aware `FixedShapeTensorArray` could be constructed with strings ..
   > When `FixedShapeTensorArray` was built we didn't expect it'd be used for 
strings.
   
   In theory nothing in the spec says that you can only use it for numerical 
data types (although that is of course the typical use case):
   
   
https://github.com/apache/arrow/blob/3420c0db2fe49d81bf3caf673e4e1302153a2c49/docs/source/format/CanonicalExtensions.rst?plain=1#L87-L89
   
   And given you can construct an extension array from the storage, you can 
indeed easily construct a FixedShapeTensorArray with any Arrow type:
   
   ```python
   >>> storage_arr = pa.array([["a", "b"], ["c", "d"], ["e", "f"]], 
pa.list_(pa.string(), 2))
   >>> arr = pa.ExtensionArray.from_storage(pa.fixed_shape_tensor(pa.string(), 
(2, )), storage_arr)
   >>> arr
   <pyarrow.lib.FixedShapeTensorArray object at 0x7f4d12e41da0>
   [
     [
       "a",
       "b"
     ],
     [
       "c",
       "d"
     ],
     [
       "e",
       "f"
     ]
   ]
   ```
   
   Of course, just because this is supported in general, doesn't necessarily 
require that we support all data types in `to_numpy_ndarray()`. But so this 
worked in 15.0, and no longer works now:
   
   ```
   >>> arr.to_numpy_ndarray()
   array([['a', 'b'],
          ['c', 'd'],
          ['e', 'f']], dtype=object)
   ```
   
   From a user point of view, I don't directly see a reason to not support this 
conversion. The practical reason it no longer works is because of an 
implementation change in https://github.com/apache/arrow/pull/37533 where we 
moved this conversion to a numpy array to the C++ level through first 
converting to a Tensor (i.e. the `to_numpy_ndarray()` method now just calls 
`self.to_tensor().to_numpy()`), and we do only support numerical data types for 
a Tensor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to