[ https://issues.apache.org/jira/browse/ARROW-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
George Sakkis updated ARROW-4492: --------------------------------- Fix Version/s: (was: 0.14.0) 0.12.1 > [Python] Failure reading Parquet column as pandas Categorical in 0.12 > --------------------------------------------------------------------- > > Key: ARROW-4492 > URL: https://issues.apache.org/jira/browse/ARROW-4492 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.0 > Reporter: George Sakkis > Priority: Major > Labels: Parquet > Fix For: 0.12.1 > > Attachments: slug.pq > > > On pyarrow 0.12.0 some (but not all) columns cannot be read as category > dtype. Attached is an extracted failing sample. > {noformat} > import dask.dataframe as dd > df = dd.read_parquet('slug.pq', categories=['slug'], > engine='pyarrow').compute() > print(len(df['slug'].dtype.categories)) > {noformat} > This works on pyarrow 0.11.1 (and fastparquet 0.2.1). -- This message was sent by Atlassian JIRA (v7.6.3#76005)