[
https://issues.apache.org/jira/browse/ARROW-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-6899:
----------------------------------
Labels: pull-request-available (was: )
> [Python] to_pandas() not implemented on list<dictionary<values=string,
> indices=int32>
> -------------------------------------------------------------------------------------
>
> Key: ARROW-6899
> URL: https://issues.apache.org/jira/browse/ARROW-6899
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.13.0, 0.15.0
> Reporter: Razvan Chitu
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: encoded.arrow
>
>
> Hi,
> {{pyarrow.Table.to_pandas()}} fails on an Arrow List Vector where the data
> vector is of type "dictionary encoded string". Here is the table schema as
> printed by pyarrow:
> {code:java}
> pyarrow.Table
> encodedList: list<$data$: dictionary<values=string, indices=int32, ordered=0>
> not null> not null
> child 0, $data$: dictionary<values=string, indices=int32, ordered=0> not
> null
> metadata
> --------
> OrderedDict() {code}
> and the data (also attached in a file to this ticket)
> {code:java}
> <pyarrow.lib.ChunkedArray object at 0x7f7ea6a748b8>
> [
> [
> -- dictionary:
> [
> "a",
> "b",
> "c",
> "d"
> ]
> -- indices:
> [
> 0,
> 1,
> 2
> ],
> -- dictionary:
> [
> "a",
> "b",
> "c",
> "d"
> ]
> -- indices:
> [
> 0,
> 3
> ]
> ]
> ] {code}
> and the exception I got
> {code:java}
> ---------------------------------------------------------------------------
> ArrowNotImplementedError Traceback (most recent call last)
> <ipython-input-10-5f865bc01df1> in <module>
> ----> 1 df.to_pandas()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/array.pxi
> in pyarrow.lib._PandasConvertible.to_pandas()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi
> in pyarrow.lib.Table._to_pandas()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py
> in table_to_blockmanager(options, table, categories, ignore_metadata)
> 700
> 701 _check_data_column_metadata_consistency(all_columns)
> --> 702 blocks = _table_to_blocks(options, table, categories)
> 703 columns = _deserialize_column_index(table, all_columns,
> column_indexes)
> 704
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py
> in _table_to_blocks(options, block_table, categories)
> 972
> 973 # Convert an arrow table to Block from the internal pandas API
> --> 974 result = pa.lib.table_to_blocks(options, block_table, categories)
> 975
> 976 # Defined above
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi
> in pyarrow.lib.table_to_blocks()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/error.pxi
> in pyarrow.lib.check_status()
> ArrowNotImplementedError: Not implemented type for list in DataFrameBlock:
> dictionary<values=string, indices=int32, ordered=0> {code}
> Note that the data vector itself can be loaded successfully by to_pandas.
> It'd be great if this would be addressed in the next version of pyarrow. For
> now, is there anything I can do on my end to bypass this unimplemented
> conversion?
> Thanks,
> Razvan
--
This message was sent by Atlassian Jira
(v8.3.4#803005)