[
https://issues.apache.org/jira/browse/ARROW-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913008#comment-16913008
]
Joris Van den Bossche commented on ARROW-6302:
----------------------------------------------
[~galuhsahid] Great!
Yes, it is the {{ApplyOriginalMetadata}} function in that file that needs to be
updated. Apart from a test, I _think_ that's the only place that needs a change.
> [Python][Parquet] Reading dictionary type with serialized Arrow schema does
> not restore "ordered" type property
> ---------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-6302
> URL: https://issues.apache.org/jira/browse/ARROW-6302
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.0
> Reporter: Galuh Sahid
> Priority: Major
> Labels: parquet
> Fix For: 0.15.0
>
>
> In pandas, I tried roundtripping to parquet with {{to_parquet}} and
> {{read_parquet}}. It preserves categorical dtypes but does not preserve their
> order.
> {code:python}
> import pandas as pd
> from pandas.io.parquet import read_parquet, to_parquet
> df = pd.DataFrame()
> df["a"] = pd.Categorical(["a", "b", "c", "a"], categories=["b", "c", "d"],
> ordered=True)
> df.to_parquet(<path>)
> actual = read_parquet(<path>)
> df["a"]
> 0 NaN
> 1 b
> 2 c
> 3 NaN
> Name: a, dtype: category
> Categories (3, object): [b < c < d]
> actual["a"]
> 0 NaN
> 1 b
> 2 c
> 3 NaN
> Name: a, dtype: category
> Categories (3, object): [b, c, d]
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)