[
https://issues.apache.org/jira/browse/ARROW-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joris Van den Bossche updated ARROW-15767:
------------------------------------------
Summary: [Python] Arrow Table with DenseUnion fails to convert to Python
Pandas DataFrame (was: [Python] Arrow Table with Nullable DenseUnion fails to
convert to Python Pandas DataFrame)
> [Python] Arrow Table with DenseUnion fails to convert to Python Pandas
> DataFrame
> --------------------------------------------------------------------------------
>
> Key: ARROW-15767
> URL: https://issues.apache.org/jira/browse/ARROW-15767
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 6.0.1
> Reporter: Ben Baumgold
> Priority: Major
> Attachments: nothing.arrow
>
>
> A feather file containing column of nullable values errors when converting to
> a Pandas DataFrame. It can be read into a pyarrow.Table as follows:
> {code:python}
> In [1]: import pyarrow.feather as feather
> In [2]: t = feather.read_table("nothing.arrow")
> In [3]: t
> Out[3]:
> pyarrow.Table
> col: dense_union<: null=0, : int32 not null=1>
> child 0, : null
> child 1, : int32 not null
> ----
> col: [ -- is_valid: all not null -- type_ids: [
> 1,
> 1,
> 1,
> 0
> ] -- value_offsets: [
> 0,
> 1,
> 2,
> 0
> ] -- child 0 type: null
> 1 nulls -- child 1 type: int32
> [
> 1,
> 2,
> 3
> ]]
> {code}
> But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get
> the following error:
> {code:python}
> In [4]: t.to_pandas()
> ---------------------------------------------------------------------------
> ArrowNotImplementedError Traceback (most recent call last)
> <ipython-input-25-8ba84762c39a> in <module>
> ----> 1 t.to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in
> pyarrow.lib._PandasConvertible.to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in
> pyarrow.lib.Table._to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in
> table_to_blockmanager(options, table, categories, ignore_metadata,
> types_mapper)
> 787 _check_data_column_metadata_consistency(all_columns)
> 788 columns = _deserialize_column_index(table, all_columns,
> column_indexes)
> --> 789 blocks = _table_to_blocks(options, table, categories,
> ext_columns_dtypes)
> 790
> 791 axes = [columns, index]
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in
> _table_to_blocks(options, block_table, categories, extension_columns)
> 1126 # Convert an arrow table to Block from the internal pandas API
> 1127 columns = block_table.column_names
> -> 1128 result = pa.lib.table_to_blocks(options, block_table, categories,
> 1129 list(extension_columns.keys()))
> 1130 return [_reconstruct_block(item, columns, extension_columns)
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in
> pyarrow.lib.table_to_blocks()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in
> pyarrow.lib.check_status()
> ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of
> type dense_union<: null=0, : int32 not null=1> is known.
> {code}
> Note the Arrow file is valid and can be read successfully by
> [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is
> [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285]. The
> [^nothing.arrow] file used in this example is attached for convenience.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)