[
https://issues.apache.org/jira/browse/ARROW-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben Epstein updated ARROW-14320:
--------------------------------
Description:
When converting a single-dimensional array to numpy, the dtype is preserved
{code:java}
import pyarrow as pa
x = pa.array([.234,.345,.456])
x.to_numpy().dtype # dtype('float64'){code}
But when doing the same for a multi-dimensional array, the dtype is lost *and
cannot be set manually*
{code:java}
x = pa.array([[1,2,3],[4,5,6]]).to_numpy(zero_copy_only=False)
print(x.dtpye) # object
x.astype(np.float64) # ValueError: setting an array element with a
sequence.{code}
Which is to say that numpy believes this array is not uniform. The only way to
get it to the proper dtype is to convert it to a python list then back to a
numpy array.
Is there another way to achieve this? Or, at least, can it be fixed such that
we can manually set the dtype of the numpy array after conversion?
I know that pyarrow doesn't support ndarrays with ndim>1
(https://issues.apache.org/jira/browse/ARROW-5645) but I was curious if this
can be achieved going the other way.
was:
When converting a single-dimensional array to numpy, the dtype is preserved
{code:java}
import pyarrow as pa
x = pa.array([.234,.345,.456])
x.to_numpy().dtype # dtype('float64'){code}
But when doing the same for a multi-dimensional array, the dtype is lost *and
cannot be set manually*
{code:java}
x = pa.array([[1,2,3],[4,5,6]]).to_numpy(zero_copy_only=False)
print(x.dtpye) # object
x.astype(np.float64) # ValueError: setting an array element with a
sequence.{code}
Which is to say that numpy believes this array is not uniform. The only way to
get it to the proper dtype is to convert it to a python list then back to a
numpy array.
Is there another way to achieve this? Or, at least, can it be fixed such that
we can manually set the dtype of the numpy array after conversion?
I know that pyarrow doesn't support ndarrays with ndim>1
(https://issues.apache.org/jira/browse/ARROW-5645) but I was curious if this
can be achieves going the other way.
> Pyarrow array to_numpy array corrupts numpy dtype
> -------------------------------------------------
>
> Key: ARROW-14320
> URL: https://issues.apache.org/jira/browse/ARROW-14320
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 5.0.0
> Reporter: Ben Epstein
> Priority: Major
>
> When converting a single-dimensional array to numpy, the dtype is preserved
> {code:java}
> import pyarrow as pa
> x = pa.array([.234,.345,.456])
> x.to_numpy().dtype # dtype('float64'){code}
> But when doing the same for a multi-dimensional array, the dtype is lost *and
> cannot be set manually*
> {code:java}
> x = pa.array([[1,2,3],[4,5,6]]).to_numpy(zero_copy_only=False)
> print(x.dtpye) # object
> x.astype(np.float64) # ValueError: setting an array element with a
> sequence.{code}
> Which is to say that numpy believes this array is not uniform. The only way
> to get it to the proper dtype is to convert it to a python list then back to
> a numpy array.
> Is there another way to achieve this? Or, at least, can it be fixed such that
> we can manually set the dtype of the numpy array after conversion?
> I know that pyarrow doesn't support ndarrays with ndim>1
> (https://issues.apache.org/jira/browse/ARROW-5645) but I was curious if this
> can be achieved going the other way.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)