[jira] [Created] (ARROW-5651) [Python] Incorrect conversion from strided Numpy array when other type is specified

JIRA Wed, 19 Jun 2019 08:41:10 -0700

Fabian Höring created ARROW-5651:
------------------------------------

             Summary: [Python] Incorrect conversion from strided Numpy array 
when other type is specified
                 Key: ARROW-5651
                 URL: https://issues.apache.org/jira/browse/ARROW-5651
             Project: Apache Arrow
          Issue Type: Improvement
    Affects Versions: 0.12.0
            Reporter: Fabian Höring



In the example below the pyarrow array gives wrong results for strided numpy 
arrays:

{code}
>> import pyarrow as pa
>> import numpy as np
>> p_s = pd.Series(np.arange(0, 10, dtype=np.float32)[1:-1:2])
>> pa.array(p_s, type=pa.float64())
<pyarrow.lib.DoubleArray object at 0x7f8453de8138>
[
  1,
  2,
  3,
  4
]
{code}

When copying the numpy array to a new location is gives the expected output:

{code}
>> import pyarrow as pa
>> import numpy as np
>> import pandas as pd
>> p_s = pd.Series(np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2]))
>> pa.array(p_s, type=pa.float64())
<pyarrow.lib.DoubleArray object at 0x7f5a0af0a4a8>                              
                                             [    
 1,
 3,
 5,
 7 
]  
{code}

Looking at the 
[code|https://github.com/apache/arrow/blob/7a5562174cffb21b16f990f64d114c1a94a30556/cpp/src/arrow/python/numpy_to_arrow.cc#L407]
 it seems like to determine the number of elements it uses the target type 
instead of the initial numpy type.

In this case the stride is 8 bytes which corresponds to 2 elements in float32 
whereas the codes tries to determine the number of elements with the target 
type which gives 1 element of float64 and therefore it reads the array one by 
one instead of every 2 elements until reaching the total number of elements.







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5651) [Python] Incorrect conversion from strided Numpy array when other type is specified

Reply via email to