[ 
https://issues.apache.org/jira/browse/ARROW-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Höring updated ARROW-5651:
---------------------------------
    Issue Type: Bug  (was: Improvement)

> [Python] Incorrect conversion from strided Numpy array when other type is 
> specified
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-5651
>                 URL: https://issues.apache.org/jira/browse/ARROW-5651
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Fabian Höring
>            Priority: Major
>
> In the example below the pyarrow array gives wrong results for strided numpy 
> arrays:
> {code}
> >> import pyarrow as pa
> >> import numpy as np
> >> p_s = pd.Series(np.arange(0, 10, dtype=np.float32)[1:-1:2])
> >> pa.array(p_s, type=pa.float64())
> <pyarrow.lib.DoubleArray object at 0x7f8453de8138>
> [
>   1,
>   2,
>   3,
>   4
> ]
> {code}
> When copying the numpy array to a new location is gives the expected output:
> {code}
> >> import pyarrow as pa
> >> import numpy as np
> >> import pandas as pd
> >> p_s = pd.Series(np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2]))
> >> pa.array(p_s, type=pa.float64())
> <pyarrow.lib.DoubleArray object at 0x7f5a0af0a4a8>                            
>                                                [    
>  1,
>  3,
>  5,
>  7 
> ]  
> {code}
> Looking at the 
> [code|https://github.com/apache/arrow/blob/7a5562174cffb21b16f990f64d114c1a94a30556/cpp/src/arrow/python/numpy_to_arrow.cc#L407]
>  it seems like to determine the number of elements it uses the target type 
> instead of the initial numpy type.
> In this case the stride is 8 bytes which corresponds to 2 elements in float32 
> whereas the codes tries to determine the number of elements with the target 
> type which gives 1 element of float64 and therefore it reads the array one by 
> one instead of every 2 elements until reaching the total number of elements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to