[
https://issues.apache.org/jira/browse/ARROW-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Balanca updated ARROW-8010:
--------------------------------
Description:
Fixed size list of base types (i.e. int, float, ...) are not convertible to
Numpy array.
The following code:
{code:java}
import pyarrow as pa
t = pa.list_(pa.float32(), 2)
arr = pa.array([[1, 2], [3, 4], [5, 6]], type=t)
arr.to_numpy(){code}
raises a not implemented Arrow error as there is no Pandas block equivalent.
It sounds reasonable that the conversion to Pandas fails, but I would expect a
natural conversion to Numpy Array, as according to the Fixed Size List Layout
([https://arrow.apache.org/docs/format/Columnar.html#]), the former could be
mapped to a 2-dimensional row major matrix (e.g. 3x2 in the previous example).
Note we can get the expected result by working around using flatten:
{code:java}
arr.flatten().to_numpy().reshape((-1, t.list_size)){code}
This form of memory representation is quite natural if ones wants to use Apache
Arrow for in-memory collection of 2D/3D points, where we wish to have
coordinates contiguous in memory.
was:
Fixed size list of base types (i.e. int, float, ...) are not convertible to
Numpy array.
The following code:
{code:java}
import pyarrow as pa
t = pa.list_(pa.float32(), 2)
arr = pa.array([[1, 2], [3, 4], [5, 6]], type=t)
arr.to_numpy(){code}
raises a not implemented Arrow error as there is no Pandas block equivalent.
It sounds reasonable that the conversion to Pandas fails, but I would expect a
natural conversion to Numpy Array, as according to the Fixed Size List Layout
([https://arrow.apache.org/docs/format/Columnar.html#]), the former could be
mapped to a 2-dimensional row major matrix (e.g. 3x2 in the previous example).
This form of memory representation is quite natural if ones wants to use Apache
Arrow for in-memory collection of 2D/3D points, where we wish to have
coordinates contiguous in memory.
> [Python] Fixed size list not convertible to Numpy Array
> -------------------------------------------------------
>
> Key: ARROW-8010
> URL: https://issues.apache.org/jira/browse/ARROW-8010
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 0.16.0
> Environment: Ubuntu 19.10 + python 3.7
> Reporter: Paul Balanca
> Priority: Major
>
> Fixed size list of base types (i.e. int, float, ...) are not convertible to
> Numpy array.
> The following code:
> {code:java}
> import pyarrow as pa
> t = pa.list_(pa.float32(), 2)
> arr = pa.array([[1, 2], [3, 4], [5, 6]], type=t)
> arr.to_numpy(){code}
> raises a not implemented Arrow error as there is no Pandas block equivalent.
> It sounds reasonable that the conversion to Pandas fails, but I would expect
> a natural conversion to Numpy Array, as according to the Fixed Size List
> Layout ([https://arrow.apache.org/docs/format/Columnar.html#]), the former
> could be mapped to a 2-dimensional row major matrix (e.g. 3x2 in the previous
> example).
> Note we can get the expected result by working around using flatten:
> {code:java}
> arr.flatten().to_numpy().reshape((-1, t.list_size)){code}
> This form of memory representation is quite natural if ones wants to use
> Apache Arrow for in-memory collection of 2D/3D points, where we wish to have
> coordinates contiguous in memory.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)