jorisvandenbossche commented on issue #29892:
URL: https://github.com/apache/arrow/issues/29892#issuecomment-1481040252
To be explicit, there is no "internal" fix to be done, as this conversion is
already possible zero copy with preserving the dtype, _if_ you convert the flat
values (i.e. what Antoine showed above):
```
>>> a = pa.array([[1,2,3], [4,5,6]])
>>> a.flatten().to_numpy()
array([1, 2, 3, 4, 5, 6])
>>> a.flatten().to_numpy().reshape(2, 3)
array([[1, 2, 3],
[4, 5, 6]])
```
But so it is more a question about what user facing API we provide for this.
Do we expect the user to do this themselves, or do we want to add some
"to_numpy_2d" method to FixedSizeListArray that does that for you?
The existing `to_numpy` cannot do this, because this method is expected to
give you a 1D array of the same length as the pyarrow array. I personally would
lean towards letting the user do this themselves, since this is relatively
straightforward to do and then you have full control (a method to get a 2D
array would also get messy if you have a list array with multiple levels of
nesting). So regarding the original topic, I would tend to close this issue.
But @westonpace makes a good point that the FixedShapeTensorArray extension
type that is being added might be interesting, depending on your exact use
case. The pyarrow API for that still needs to be finalized and merged, but we
were planning to add a `to_numpy_array` method (or some other name) that gives
you the actual underlying array zero-copy as a N-d array. See the examples in
the documentation that is being added in
https://github.com/apache/arrow/pull/33948
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]