[jira] [Commented] (ARROW-7365) [Python] Support FixedSizeList type in conversion to numpy/pandas

Paul Balanca (Jira) Mon, 16 Mar 2020 02:42:28 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060071#comment-17060071
 ]


Paul Balanca commented on ARROW-7365:
-------------------------------------

If I may continue the discussion point raised in ARROW-8010.

I believe there is a use case for FixedSizeList arrays to be convertible to 
two-dimensional Numpy arrays (or even multi-dimensional ones). There exist many 
applications where ones want to store small vectors/matrices with known static 
dimensions (i.e. 3d vector, 3d affine transform). The fixed size Arrow column 
format is ideal for that kind of purpose, and then allow to write 
high-performance code on this kind of storage.

But in order to be possible to write this kind of high perf. pipelines base on 
PyArrow, one needs to be able to extract the full 2D Numpy array from the 
PyArrow object. Technically, it is possible as shown by the small example in 
ARROW-8010, but it would be probably valuable to be part of the official API.

Is the `to_numpy` the right place to implement it? I am not sure, I probably 
don't have the depth of view on this project to have a good opinion. But I 
believe there are numerous pure Numpy computation pipeline based on PyArrow 
in-memory storage which would benefit from a "closer to metal" Numpy API, 
independent of the Pandas-like series representation.

> [Python] Support FixedSizeList type in conversion to numpy/pandas
> -----------------------------------------------------------------
>
>                 Key: ARROW-7365
>                 URL: https://issues.apache.org/jira/browse/ARROW-7365
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 0.17.0
>
>
> Follow-up on ARROW-7261, still need to add support for FixedSizeListType in 
> the arrow -> python conversion (arrow_to_pandas.cc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7365) [Python] Support FixedSizeList type in conversion to numpy/pandas

Reply via email to