[jira] [Updated] (ARROW-8010) [Python] Fixed size list not convertible to Numpy Array

Paul Balanca (Jira) Thu, 05 Mar 2020 06:30:12 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Balanca updated ARROW-8010:
--------------------------------
    Description: 
Fixed size list of base types (i.e. int, float, ...) are not convertible to 
Numpy array.

The following code:
{code:java}
import pyarrow as pa

t = pa.list_(pa.float32(), 2)
arr = pa.array([[1, 2], [3, 4], [5, 6]], type=t)
arr.to_numpy(){code}
raises a not implemented Arrow error as there is no Pandas block equivalent.

It sounds reasonable that the conversion to Pandas fails, but I would expect a 
natural conversion to Numpy Array, as according to the Fixed Size List Layout 
([https://arrow.apache.org/docs/format/Columnar.html#]), the former could be 
mapped to a 2-dimensional row major matrix (e.g. 3x2 in the previous example).

Note we can get the expected result by working around using flatten:
{code:java}
arr.flatten().to_numpy().reshape((-1, t.list_size)){code}
This form of memory representation is quite natural if ones wants to use Apache 
Arrow for in-memory collection of 2D/3D points, where we wish to have 
coordinates contiguous in memory.

  was:
Fixed size list of base types (i.e. int, float, ...) are not convertible to 
Numpy array.

The following code:
{code:java}
import pyarrow as pa

t = pa.list_(pa.float32(), 2)
arr = pa.array([[1, 2], [3, 4], [5, 6]], type=t)
arr.to_numpy(){code}
raises a not implemented Arrow error as there is no Pandas block equivalent.

It sounds reasonable that the conversion to Pandas fails, but I would expect a 
natural conversion to Numpy Array, as according to the Fixed Size List Layout 
([https://arrow.apache.org/docs/format/Columnar.html#]), the former could be 
mapped to a 2-dimensional row major matrix (e.g. 3x2 in the previous example).

This form of memory representation is quite natural if ones wants to use Apache 
Arrow for in-memory collection of 2D/3D points, where we wish to have 
coordinates contiguous in memory.


> [Python] Fixed size list not convertible to Numpy Array
> -------------------------------------------------------
>
>                 Key: ARROW-8010
>                 URL: https://issues.apache.org/jira/browse/ARROW-8010
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.16.0
>         Environment: Ubuntu 19.10 + python 3.7
>            Reporter: Paul Balanca
>            Priority: Major
>
> Fixed size list of base types (i.e. int, float, ...) are not convertible to 
> Numpy array.
> The following code:
> {code:java}
> import pyarrow as pa
> t = pa.list_(pa.float32(), 2)
> arr = pa.array([[1, 2], [3, 4], [5, 6]], type=t)
> arr.to_numpy(){code}
> raises a not implemented Arrow error as there is no Pandas block equivalent.
> It sounds reasonable that the conversion to Pandas fails, but I would expect 
> a natural conversion to Numpy Array, as according to the Fixed Size List 
> Layout ([https://arrow.apache.org/docs/format/Columnar.html#]), the former 
> could be mapped to a 2-dimensional row major matrix (e.g. 3x2 in the previous 
> example).
> Note we can get the expected result by working around using flatten:
> {code:java}
> arr.flatten().to_numpy().reshape((-1, t.list_size)){code}
> This form of memory representation is quite natural if ones wants to use 
> Apache Arrow for in-memory collection of 2D/3D points, where we wish to have 
> coordinates contiguous in memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8010) [Python] Fixed size list not convertible to Numpy Array

Reply via email to