[
https://issues.apache.org/jira/browse/ARROW-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858552#comment-16858552
]
Joris Van den Bossche commented on ARROW-3801:
----------------------------------------------
I am not yet too familiar with the logics behind the conversions from arrow to
python, but I want to note that also plain array conversion gives a read-only
numpy array:
{code:python}
In [53]: a = pa.array([1, 2, 3])
In [54]: a.to_pandas()
Out[54]: array([1, 2, 3])
In [55]: a.to_pandas().flags.writeable
Out[55]: False
{code}
So this is in any case not specific to categoricals (DictionaryArray). And eg
also the codes of the categorical are read-only.
> [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable
> ------------------------------------------------------------------------
>
> Key: ARROW-3801
> URL: https://issues.apache.org/jira/browse/ARROW-3801
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 0.10.0
> Reporter: Thomas Buhrmann
> Priority: Major
> Fix For: 0.14.0
>
>
> Serializing and deserializing a pandas series with categorical dtype will
> make the categorical index non-writeable, which in turn trips up pandas when
> e.g. reordering the categories, raising "ValueError: buffer source array is
> read-only" :
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.Series([1,2,3], dtype='category', name="c1").to_frame()
> print("DType before:", repr(df.c1.dtype))
> print("Writeable:", df.c1.cat.categories.values.flags.writeable)
> ro = df.c1.cat.reorder_categories([3,2,1])
> print("DType reordered:", repr(ro.dtype), "\n")
> tbl = pa.Table.from_pandas(df)
> df2 = tbl.to_pandas()
> print("DType after:", repr(df2.c1.dtype))
> print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
> ro = df2.c1.cat.reorder_categories([3,2,1])
> print("DType reordered:", repr(ro.dtype), "\n")
> {code}
>
> Outputs:
>
> {code:java}
> DType before: CategoricalDtype(categories=[1, 2, 3], ordered=False)
> Writeable: True
> DType reordered: CategoricalDtype(categories=[3, 2, 1], ordered=False)
> DType after: CategoricalDtype(categories=[1, 2, 3], ordered=False)
> Writeable: False
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> <ipython-input-365-85b439586c1a> in <module>
> 12 print("DType after:", repr(df2.c1.dtype))
> 13 print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
> ---> 14 ro = df2.c1.cat.reorder_categories([3,2,1])
> 15 print("DType reordered:", repr(ro.dtype), "\n")
> {code}
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)