[
https://issues.apache.org/jira/browse/ARROW-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103186#comment-16103186
]
Wes McKinney commented on ARROW-1286:
-------------------------------------
Note that Parquet does not have a categorical type, and dictionary encoding
cannot necessarily relied upon universally to write categoricals. We can look
at the pandas schema in the metadata and make best efforts to reconstruct the
original pandas.Categorical.
> PYTHON: support Categorical serialization to/from parquet
> ---------------------------------------------------------
>
> Key: ARROW-1286
> URL: https://issues.apache.org/jira/browse/ARROW-1286
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Jeff Reback
> Assignee: Florian Jetter
> Fix For: 0.6.0
>
>
> related to https://issues.apache.org/jira/browse/ARROW-439
> pandas Categorical types are not NotImplemented. minimal example.
> pandas 0.20.3 & pyarrow 0.5.0
> {code}
> In [1]: df = pd.DataFrame({'a': pd.Categorical(list('abc'))})
> In [2]: df.dtypes
> Out[2]:
> a category
> dtype: object
> In [4]: import pyarrow
> In [5]: import pyarrow.parquet
> In [6]: table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
> ...: pyarrow.parquet.write_table(
> ...: table, 'foo.pq')
> ...:
> ...:
> ---------------------------------------------------------------------------
> ArrowNotImplementedError Traceback (most recent call last)
> <ipython-input-6-4512e9a2e15e> in <module>()
> 1 table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
> 2 pyarrow.parquet.write_table(
> ----> 3 table, 'foo.pq')
> 4
> /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py
> in write_table(table, where, row_group_size, version, use_dictionary,
> compression, use_deprecated_int96_timestamps, **kwargs)
> 770 version=version,
> 771
> use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
> --> 772 writer = ParquetWriter(where, table.schema, **options)
> 773 writer.write_table(table, row_group_size=row_group_size)
> 774 writer.close()
> _parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
> error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: NotImplemented: unhandled type
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)