[ 
https://issues.apache.org/jira/browse/ARROW-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103186#comment-16103186
 ] 

Wes McKinney commented on ARROW-1286:
-------------------------------------

Note that Parquet does not have a categorical type, and dictionary encoding 
cannot necessarily relied upon universally to write categoricals. We can look 
at the pandas schema in the metadata and make best efforts to reconstruct the 
original pandas.Categorical. 

> PYTHON: support Categorical serialization to/from parquet
> ---------------------------------------------------------
>
>                 Key: ARROW-1286
>                 URL: https://issues.apache.org/jira/browse/ARROW-1286
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Jeff Reback
>            Assignee: Florian Jetter
>             Fix For: 0.6.0
>
>
> related to https://issues.apache.org/jira/browse/ARROW-439
> pandas Categorical types are not NotImplemented. minimal example.
> pandas 0.20.3 & pyarrow 0.5.0
> {code}
> In [1]: df = pd.DataFrame({'a': pd.Categorical(list('abc'))})
> In [2]: df.dtypes
> Out[2]: 
> a    category
> dtype: object
> In [4]: import pyarrow
> In [5]: import pyarrow.parquet
> In [6]: table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
>    ...: pyarrow.parquet.write_table(
>    ...:             table, 'foo.pq')
>    ...:             
>    ...: 
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> <ipython-input-6-4512e9a2e15e> in <module>()
>       1 table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
>       2 pyarrow.parquet.write_table(
> ----> 3             table, 'foo.pq')
>       4 
> /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py
>  in write_table(table, where, row_group_size, version, use_dictionary, 
> compression, use_deprecated_int96_timestamps, **kwargs)
>     770         version=version,
>     771         
> use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
> --> 772     writer = ParquetWriter(where, table.schema, **options)
>     773     writer.write_table(table, row_group_size=row_group_size)
>     774     writer.close()
> _parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
> error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: NotImplemented: unhandled type
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to