[
https://issues.apache.org/jira/browse/ARROW-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou resolved ARROW-5651.
-----------------------------------
Resolution: Fixed
Fix Version/s: 0.15.0
Issue resolved by pull request 5005
[https://github.com/apache/arrow/pull/5005]
> [Python] Incorrect conversion from strided Numpy array when other type is
> specified
> -----------------------------------------------------------------------------------
>
> Key: ARROW-5651
> URL: https://issues.apache.org/jira/browse/ARROW-5651
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Fabian Höring
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.15.0
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> In the example below the PyArrow array gives wrong results for strided numpy
> arrays when the type is different from the initial Numpy type:
> {code}
> >> import pyarrow as pa
> >> import numpy as np
> >> np_array = np.arange(0, 10, dtype=np.float32)[1:-1:2]
> >> pa.array(np_array, type=pa.float64())
> <pyarrow.lib.DoubleArray object at 0x7f8453de8138>
> [
> 1,
> 2,
> 3,
> 4
> ]
> {code}
> When copying the Numpy array to a new location is gives the expected output:
> {code}
> >> import pyarrow as pa
> >> import numpy as np
> >> np_array = np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2])
> >> pa.array(np_array, type=pa.float64())
> <pyarrow.lib.DoubleArray object at 0x7f5a0af0a4a8>
> [
> 1,
> 3,
> 5,
> 7
> ]
> {code}
> Looking at the
> [code|https://github.com/apache/arrow/blob/7a5562174cffb21b16f990f64d114c1a94a30556/cpp/src/arrow/python/numpy_to_arrow.cc#L407]
> it seems that to determine the number of elements, the target type is used
> instead of the initial numpy type.
> In this case the stride is 8 bytes which corresponds to 2 elements in float32
> whereas the codes tries to determine the number of elements with the target
> type which gives 1 element of float64 and therefore it reads the array one by
> one instead of every 2 elements until reaching the total number of elements.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)