[ https://issues.apache.org/jira/browse/ARROW-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497425#comment-17497425 ]
Joris Van den Bossche commented on ARROW-15767: ----------------------------------------------- There is nothing wrong with your file (it is indeed valid, as it can also be read by pyarrow into a pyarrow.Table), but as the error type indicates: this conversion is just not yet implemented. Specifically for the union types, there are not yet much utilities implemented for interacting with this kind of data on the Python (numpy, pandas) <-> Arrow interaction layer. For example, also converting a python structure to a union array is not yet implemented (for this I found ARROW-2774). For the missing conversion to Python, I didn't directly find an issue. For conversion to Python, only a conversion to a plain python list is supported: {code} >>> t["col"].to_pylist() [1, 2, 3, None] {code} In general, we could convert an arrow union type to an object dtype array in numpy/pandas, but that might also not always be very useful. > [Python] Arrow Table with Nullable DenseUnion fails to convert to Python > Pandas DataFrame > ----------------------------------------------------------------------------------------- > > Key: ARROW-15767 > URL: https://issues.apache.org/jira/browse/ARROW-15767 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 6.0.1 > Reporter: Ben Baumgold > Priority: Major > Attachments: nothing.arrow > > > A feather file containing column of nullable values errors when converting to > a Pandas DataFrame. It can be read into a pyarrow.Table as follows: > {code:python} > In [1]: import pyarrow.feather as feather > In [2]: t = feather.read_table("nothing.arrow") > In [3]: t > Out[3]: > pyarrow.Table > col: dense_union<: null=0, : int32 not null=1> > child 0, : null > child 1, : int32 not null > ---- > col: [ -- is_valid: all not null -- type_ids: [ > 1, > 1, > 1, > 0 > ] -- value_offsets: [ > 0, > 1, > 2, > 0 > ] -- child 0 type: null > 1 nulls -- child 1 type: int32 > [ > 1, > 2, > 3 > ]] > {code} > But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get > the following error: > {code:python} > In [4]: t.to_pandas() > --------------------------------------------------------------------------- > ArrowNotImplementedError Traceback (most recent call last) > <ipython-input-25-8ba84762c39a> in <module> > ----> 1 t.to_pandas() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in > pyarrow.lib._PandasConvertible.to_pandas() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in > pyarrow.lib.Table._to_pandas() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in > table_to_blockmanager(options, table, categories, ignore_metadata, > types_mapper) > 787 _check_data_column_metadata_consistency(all_columns) > 788 columns = _deserialize_column_index(table, all_columns, > column_indexes) > --> 789 blocks = _table_to_blocks(options, table, categories, > ext_columns_dtypes) > 790 > 791 axes = [columns, index] > ~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in > _table_to_blocks(options, block_table, categories, extension_columns) > 1126 # Convert an arrow table to Block from the internal pandas API > 1127 columns = block_table.column_names > -> 1128 result = pa.lib.table_to_blocks(options, block_table, categories, > 1129 list(extension_columns.keys())) > 1130 return [_reconstruct_block(item, columns, extension_columns) > ~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in > pyarrow.lib.table_to_blocks() > ~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status() > ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of > type dense_union<: null=0, : int32 not null=1> is known. > {code} > Note the Arrow file is valid and can be read successfully by > [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is > [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285]. The > [^nothing.arrow] file used in this example is attached for convenience. -- This message was sent by Atlassian Jira (v8.20.1#820001)