[ https://issues.apache.org/jira/browse/ARROW-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858460#comment-16858460 ]
Joris Van den Bossche commented on ARROW-3801: ---------------------------------------------- [~buhrmann] do you know which version of pandas you were using? As for me, with the combinations of pandas+arrow master or pandas 0.24.2 + arrow 0.12.1, this works fine for me (the reordered categorical its categories get turned into a writable numpy array). There have been improvements in pandas to deal with read-only arrays related to hastables, such as https://github.com/pandas-dev/pandas/pull/18825 and https://github.com/pandas-dev/pandas/pull/21688, so those might have fixed it. > [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable > ------------------------------------------------------------------------ > > Key: ARROW-3801 > URL: https://issues.apache.org/jira/browse/ARROW-3801 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.10.0 > Reporter: Thomas Buhrmann > Priority: Major > Fix For: 0.14.0 > > > Serializing and deserializing a pandas series with categorical dtype will > make the categorical index non-writeable, which in turn trips up pandas when > e.g. reordering the categories, raising "ValueError: buffer source array is > read-only" : > {code} > import pandas as pd > import pyarrow as pa > df = pd.Series([1,2,3], dtype='category', name="c1").to_frame() > print("DType before:", repr(df.c1.dtype)) > print("Writeable:", df.c1.cat.categories.values.flags.writeable) > ro = df.c1.cat.reorder_categories([3,2,1]) > print("DType reordered:", repr(ro.dtype), "\n") > tbl = pa.Table.from_pandas(df) > df2 = tbl.to_pandas() > print("DType after:", repr(df2.c1.dtype)) > print("Writeable:", df2.c1.cat.categories.values.flags.writeable) > ro = df2.c1.cat.reorder_categories([3,2,1]) > print("DType reordered:", repr(ro.dtype), "\n") > {code} > > Outputs: > > {code:java} > DType before: CategoricalDtype(categories=[1, 2, 3], ordered=False) > Writeable: True > DType reordered: CategoricalDtype(categories=[3, 2, 1], ordered=False) > DType after: CategoricalDtype(categories=[1, 2, 3], ordered=False) > Writeable: False > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > <ipython-input-365-85b439586c1a> in <module> > 12 print("DType after:", repr(df2.c1.dtype)) > 13 print("Writeable:", df2.c1.cat.categories.values.flags.writeable) > ---> 14 ro = df2.c1.cat.reorder_categories([3,2,1]) > 15 print("DType reordered:", repr(ro.dtype), "\n") > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)