[ 
https://issues.apache.org/jira/browse/ARROW-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-7112:
----------------------------------
    Fix Version/s: 0.15.1

> Wrong contents when initializinga pyarrow.Table from boolean DataFrame 
> -----------------------------------------------------------------------
>
>                 Key: ARROW-7112
>                 URL: https://issues.apache.org/jira/browse/ARROW-7112
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.1
>         Environment: Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu
>            Reporter: Joachim Haga
>            Priority: Major
>             Fix For: 0.15.1
>
>
> When initializing a Table from a boolean pandas.DataFrame _that is not in 
> Fortran order_, the contents of the resulting Table is different from the 
> contents of the DataFrame.
> Sample:
>  
> {code:java}
> import pandas as pd
> import pyarrow as pa
> import numpy as np
> mask = np.full((3,3), False)
> mask[:,1] = True
> df = pd.DataFrame(mask)
> print(df)
> print(pa.table(df).to_pandas()) 
> {code}
>  
> The output:
>  
> {noformat}
>        0     1      2
> 0  False  True  False
> 1  False  True  False
> 2  False  True  False
>        0      1      2
> 0  False   True  False
> 1  False  False  False
> 2  False  False  False
> {noformat}
> I.e., column 1 is different before and after roundtripping through pa.Table.
> If I add *{{order='F'}}* to the *{{np.full}}* invocation, the result is as 
> expected. Also, the problem seems to disappear if I use {{*dtype=int*}}.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to