[
https://issues.apache.org/jira/browse/ARROW-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated ARROW-7112:
----------------------------------
Fix Version/s: 0.15.1
> Wrong contents when initializinga pyarrow.Table from boolean DataFrame
> -----------------------------------------------------------------------
>
> Key: ARROW-7112
> URL: https://issues.apache.org/jira/browse/ARROW-7112
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.14.1
> Environment: Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu
> Reporter: Joachim Haga
> Priority: Major
> Fix For: 0.15.1
>
>
> When initializing a Table from a boolean pandas.DataFrame _that is not in
> Fortran order_, the contents of the resulting Table is different from the
> contents of the DataFrame.
> Sample:
>
> {code:java}
> import pandas as pd
> import pyarrow as pa
> import numpy as np
> mask = np.full((3,3), False)
> mask[:,1] = True
> df = pd.DataFrame(mask)
> print(df)
> print(pa.table(df).to_pandas())
> {code}
>
> The output:
>
> {noformat}
> 0 1 2
> 0 False True False
> 1 False True False
> 2 False True False
> 0 1 2
> 0 False True False
> 1 False False False
> 2 False False False
> {noformat}
> I.e., column 1 is different before and after roundtripping through pa.Table.
> If I add *{{order='F'}}* to the *{{np.full}}* invocation, the result is as
> expected. Also, the problem seems to disappear if I use {{*dtype=int*}}.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)