[ 
https://issues.apache.org/jira/browse/ARROW-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497425#comment-17497425
 ] 

Joris Van den Bossche commented on ARROW-15767:
-----------------------------------------------

There is nothing wrong with your file (it is indeed valid, as it can also be 
read by pyarrow into a pyarrow.Table), but as the error type indicates: this 
conversion is just not yet implemented. 

Specifically for the union types, there are not yet much utilities implemented 
for interacting with this kind of data on the Python (numpy, pandas) <-> Arrow 
interaction layer. For example, also converting a python structure to a union 
array is not yet implemented (for this I found ARROW-2774). For the missing 
conversion to Python, I didn't directly find an issue.

For conversion to Python, only a conversion to a plain python list is supported:

{code}
>>> t["col"].to_pylist()
[1, 2, 3, None]
{code}

In general, we could convert an arrow union type to an object dtype array in 
numpy/pandas, but that might also not always be very useful.

> [Python] Arrow Table with Nullable DenseUnion fails to convert to Python 
> Pandas DataFrame
> -----------------------------------------------------------------------------------------
>
>                 Key: ARROW-15767
>                 URL: https://issues.apache.org/jira/browse/ARROW-15767
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 6.0.1
>            Reporter: Ben Baumgold
>            Priority: Major
>         Attachments: nothing.arrow
>
>
> A feather file containing column of nullable values errors when converting to 
> a Pandas DataFrame. It can be read into a pyarrow.Table as follows:
> {code:python}
> In [1]: import pyarrow.feather as feather
> In [2]: t = feather.read_table("nothing.arrow")
> In [3]: t
> Out[3]:
> pyarrow.Table
> col: dense_union<: null=0, : int32 not null=1>
>   child 0, : null
>   child 1, : int32 not null
> ----
> col: [  -- is_valid: all not null  -- type_ids:     [
>       1,
>       1,
>       1,
>       0
>     ]  -- value_offsets:     [
>       0,
>       1,
>       2,
>       0
>     ]  -- child 0 type: null
> 1 nulls  -- child 1 type: int32
>     [
>       1,
>       2,
>       3
>     ]]
> {code}
> But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get 
> the following error:
> {code:python}
> In [4]: t.to_pandas()
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> <ipython-input-25-8ba84762c39a> in <module>
> ----> 1 t.to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in 
> pyarrow.lib._PandasConvertible.to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in 
> pyarrow.lib.Table._to_pandas()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in 
> table_to_blockmanager(options, table, categories, ignore_metadata, 
> types_mapper)
>     787     _check_data_column_metadata_consistency(all_columns)
>     788     columns = _deserialize_column_index(table, all_columns, 
> column_indexes)
> --> 789     blocks = _table_to_blocks(options, table, categories, 
> ext_columns_dtypes)
>     790
>     791     axes = [columns, index]
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in 
> _table_to_blocks(options, block_table, categories, extension_columns)
>    1126     # Convert an arrow table to Block from the internal pandas API
>    1127     columns = block_table.column_names
> -> 1128     result = pa.lib.table_to_blocks(options, block_table, categories,
>    1129                                     list(extension_columns.keys()))
>    1130     return [_reconstruct_block(item, columns, extension_columns)
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in 
> pyarrow.lib.table_to_blocks()
> ~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in 
> pyarrow.lib.check_status()
> ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of 
> type dense_union<: null=0, : int32 not null=1> is known.
> {code}
> Note the Arrow file is valid and can be read successfully by 
> [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is 
> [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285].  The  
> [^nothing.arrow]  file used in this example is attached for convenience.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to