[
https://issues.apache.org/jira/browse/ARROW-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600947#comment-17600947
]
Roberto Lobo edited comment on ARROW-17636 at 9/6/22 7:09 PM:
--------------------------------------------------------------
Using an workaround:
{code:java}
conversion_options['types_mapper'] = _TYPE_MAPPINGS.get
try:
data = table.to_pandas(**conversion_options)
except NotImplementedError:
# FIX NotImplemented that happens when partition column is int
problematic_columns = []
for tcolumn in list(table.columns):
if isinstance(tcolumn.type, pa.DictionaryType):
if pa.types.is_integer(tcolumn.type.value_type) and
pa.types.is_integer(tcolumn.type.index_type):
problematic_columns.append((tcolumn._name, tcolumn,
pa.int64()))
for tcolumn_name, tcolumn, tcolumn_type in problematic_columns:
table = table.drop([tcolumn_name])
table = table.append_column(pa.field(tcolumn._name,
tcolumn_type), [tcolumn.to_pylist()])
data = table.to_pandas(**conversion_options)
##############################################################
{code}
was (Author: JIRAUSER295439):
Using an workaround:
{code:java}
conversion_options['types_mapper'] = _TYPE_MAPPINGS.get
try: data = table.to_pandas(**conversion_options)
except NotImplementedError: # FIX NotImplemented that happens when
partition column is int problematic_columns = [] for
tcolumn in list(table.columns): if isinstance(tcolumn.type,
pa.DictionaryType): if
pa.types.is_integer(tcolumn.type.value_type) and
pa.types.is_integer(tcolumn.type.index_type):
problematic_columns.append((tcolumn._name, tcolumn, pa.int64()))
for tcolumn_name, tcolumn, tcolumn_type in problematic_columns:
table = table.drop([tcolumn_name]) # table =
table.append_column(tcolumn_name, [tcolumn.to_numpy()]) table =
table.append_column(pa.field(tcolumn._name, tcolumn_type),
[tcolumn.to_pylist()]) data =
table.to_pandas(**conversion_options)
##############################################################
{code}
> Converting Table to pandas raises NotImplementedError (when table previously
> saved as partitioned parquet dataset)
> ------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-17636
> URL: https://issues.apache.org/jira/browse/ARROW-17636
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 9.0.0
> Environment: Docker container, based on continuumio/anaconda3
> Python 3.9.12
> PyArrow 9.0.0
> Reporter: Roberto Lobo
> Priority: Major
>
> When converting a table in which one of the column's type is of
> DictionaryType (values=int32, indices=int32, ordered=0) the conversion to
> pandas DataFrame fails with:
> NotImplementedError: dictionary<values=int32, indices=int32, ordered=0>
> The dictionary has this conversion not implmented yet.
> This DictionaryType is used as type when using one of the columns (Int64) as
> one of the parquet's dataset partition columns.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)