[
https://issues.apache.org/jira/browse/ARROW-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060071#comment-17060071
]
Paul Balanca commented on ARROW-7365:
-------------------------------------
If I may continue the discussion point raised in ARROW-8010.
I believe there is a use case for FixedSizeList arrays to be convertible to
two-dimensional Numpy arrays (or even multi-dimensional ones). There exist many
applications where ones want to store small vectors/matrices with known static
dimensions (i.e. 3d vector, 3d affine transform). The fixed size Arrow column
format is ideal for that kind of purpose, and then allow to write
high-performance code on this kind of storage.
But in order to be possible to write this kind of high perf. pipelines base on
PyArrow, one needs to be able to extract the full 2D Numpy array from the
PyArrow object. Technically, it is possible as shown by the small example in
ARROW-8010, but it would be probably valuable to be part of the official API.
Is the `to_numpy` the right place to implement it? I am not sure, I probably
don't have the depth of view on this project to have a good opinion. But I
believe there are numerous pure Numpy computation pipeline based on PyArrow
in-memory storage which would benefit from a "closer to metal" Numpy API,
independent of the Pandas-like series representation.
> [Python] Support FixedSizeList type in conversion to numpy/pandas
> -----------------------------------------------------------------
>
> Key: ARROW-7365
> URL: https://issues.apache.org/jira/browse/ARROW-7365
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
> Fix For: 0.17.0
>
>
> Follow-up on ARROW-7261, still need to add support for FixedSizeListType in
> the arrow -> python conversion (arrow_to_pandas.cc)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)