[ 
https://issues.apache.org/jira/browse/ARROW-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6899:
--------------------------------------

    Assignee: Neal Richardson  (was: Wes McKinney)

> [Python] to_pandas() not implemented on list<dictionary<values=string, 
> indices=int32>
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-6899
>                 URL: https://issues.apache.org/jira/browse/ARROW-6899
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.13.0, 0.15.0
>            Reporter: Razvan Chitu
>            Assignee: Neal Richardson
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.16.0
>
>         Attachments: encoded.arrow
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi,
> {{pyarrow.Table.to_pandas()}} fails on an Arrow List Vector where the data 
> vector is of type "dictionary encoded string". Here is the table schema as 
> printed by pyarrow:
> {code:java}
> pyarrow.Table
> encodedList: list<$data$: dictionary<values=string, indices=int32, ordered=0> 
> not null> not null
>   child 0, $data$: dictionary<values=string, indices=int32, ordered=0> not 
> null
> metadata
> --------
> OrderedDict() {code}
> and the data (also attached in a file to this ticket)
> {code:java}
> <pyarrow.lib.ChunkedArray object at 0x7f7ea6a748b8>
> [
>   [
>     -- dictionary:
>       [
>         "a",
>         "b",
>         "c",
>         "d"
>       ]
>     -- indices:
>       [
>         0,
>         1,
>         2
>       ],
>     -- dictionary:
>       [
>         "a",
>         "b",
>         "c",
>         "d"
>       ]
>     -- indices:
>       [
>         0,
>         3
>       ]
>   ]
> ] {code}
> and the exception I got
> {code:java}
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> <ipython-input-10-5f865bc01df1> in <module>
> ----> 1 df.to_pandas()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/array.pxi
>  in pyarrow.lib._PandasConvertible.to_pandas()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi
>  in pyarrow.lib.Table._to_pandas()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py
>  in table_to_blockmanager(options, table, categories, ignore_metadata)
>     700 
>     701     _check_data_column_metadata_consistency(all_columns)
> --> 702     blocks = _table_to_blocks(options, table, categories)
>     703     columns = _deserialize_column_index(table, all_columns, 
> column_indexes)
>     704 
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py
>  in _table_to_blocks(options, block_table, categories)
>     972 
>     973     # Convert an arrow table to Block from the internal pandas API
> --> 974     result = pa.lib.table_to_blocks(options, block_table, categories)
>     975 
>     976     # Defined above
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi
>  in pyarrow.lib.table_to_blocks()
> ~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/error.pxi
>  in pyarrow.lib.check_status()
> ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: 
> dictionary<values=string, indices=int32, ordered=0> {code}
> Note that the data vector itself can be loaded successfully by to_pandas.
> It'd be great if this would be addressed in the next version of pyarrow. For 
> now, is there anything I can do on my end to bypass this unimplemented 
> conversion?
> Thanks,
> Razvan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to