[
https://issues.apache.org/jira/browse/ARROW-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601174#comment-17601174
]
Joris Van den Bossche commented on ARROW-17636:
-----------------------------------------------
[~rhlobo] could you provide a reproducible example? (or a traceback might be
helpful as well)
I don't directly any problem with the following simple example converting a
dictionary column with in32 indices and dictionary to a pandas categorical:
{code}
In [2]: table = pa.table({'col': pa.DictionaryArray.from_arrays(pa.array([0, 1,
0], pa.int32()), pa.array([10, 11], pa.int32()))})
In [3]: table
Out[3]:
pyarrow.Table
col: dictionary<values=int32, indices=int32, ordered=0>
----
col: [ -- dictionary:
[10,11] -- indices:
[0,1,0]]
In [4]: table.to_pandas()
Out[4]:
col
0 10
1 11
2 10
In [5]: table.to_pandas().dtypes
Out[5]:
col category
dtype: object
{code}
> Converting Table to pandas raises NotImplementedError (when table previously
> saved as partitioned parquet dataset)
> ------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-17636
> URL: https://issues.apache.org/jira/browse/ARROW-17636
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 9.0.0
> Environment: Docker container, based on continuumio/anaconda3
> Python 3.9.12
> PyArrow 9.0.0
> Reporter: Roberto Lobo
> Priority: Major
>
> When converting a table in which one of the column's type is of
> DictionaryType (values=int32, indices=int32, ordered=0) the conversion to
> pandas DataFrame fails with:
> NotImplementedError: dictionary<values=int32, indices=int32, ordered=0>
> The dictionary has this conversion not implmented yet.
> This DictionaryType is used as type when using one of the columns (Int64) as
> one of the parquet's dataset partition columns.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)