[jira] [Commented] (ARROW-17636) Converting Table to pandas raises NotImplementedError (when table previously saved as partitioned parquet dataset)

Joris Van den Bossche (Jira) Wed, 07 Sep 2022 00:32:08 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601174#comment-17601174
 ]


Joris Van den Bossche commented on ARROW-17636:
-----------------------------------------------

[~rhlobo] could you provide a reproducible example? (or a traceback might be 
helpful as well)

I don't directly any problem with the following simple example converting a 
dictionary column with in32 indices and dictionary to a pandas categorical:

{code}
In [2]: table = pa.table({'col': pa.DictionaryArray.from_arrays(pa.array([0, 1, 
0], pa.int32()), pa.array([10, 11], pa.int32()))})

In [3]: table
Out[3]: 
pyarrow.Table
col: dictionary<values=int32, indices=int32, ordered=0>
----
col: [  -- dictionary:
[10,11]  -- indices:
[0,1,0]]

In [4]: table.to_pandas()
Out[4]: 
  col
0  10
1  11
2  10

In [5]: table.to_pandas().dtypes
Out[5]: 
col    category
dtype: object
{code}

> Converting Table to pandas raises NotImplementedError (when table previously 
> saved as partitioned parquet dataset)
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17636
>                 URL: https://issues.apache.org/jira/browse/ARROW-17636
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 9.0.0
>         Environment: Docker container, based on continuumio/anaconda3
> Python 3.9.12
> PyArrow 9.0.0
>            Reporter: Roberto Lobo
>            Priority: Major
>
> When converting a table in which one of the column's type is of 
> DictionaryType (values=int32, indices=int32, ordered=0) the conversion to 
> pandas DataFrame fails with:
> NotImplementedError: dictionary<values=int32, indices=int32, ordered=0>
> The dictionary has this conversion not implmented yet.
> This DictionaryType is used as type when using one of the columns (Int64) as 
> one of the parquet's dataset partition columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17636) Converting Table to pandas raises NotImplementedError (when table previously saved as partitioned parquet dataset)

Reply via email to