timsaucer opened a new issue, #47675:
URL: https://github.com/apache/arrow/issues/47675
### Describe the enhancement requested
When you attempt to create a pandas dataframe from a pyarrow table that
contains a list of fixed sized bytes, it generates the error:
```
ArrowNotImplementedError: Not implemented type for Arrow list to pandas:
fixed_size_binary[32]
```
Here is a minimal reproducible example:
```
import pyarrow as pa
import pandas as pd
# Create some sample 32-byte binary data
data = [
b'\x00' * 32, # All zeros
b'\xff' * 32, # All ones
b'\x01\x02\x03\x04' * 8, # Repeating pattern
bytes(range(32)), # Sequential bytes 0-31
b'Hello World!' + b'\x00' * 20, # Text padded with zeros
]
# Create a table with a fixed-size binary column
table = pa.table({
'id': [1, 2, 3, 4, 5],
'fixed_binary': pa.array(data, type=pa.binary(32))
})
# This works fine - when you have only a fixed size binary
print(table.to_pandas())
data_as_array = pa.array(data, type=pa.binary(32))
table2 = pa.table({
"list_fixed_binary": pa.array([data_as_array],
type=pa.list_(pa.binary(32)))
})
# This fails - when you have a list of fixed sized binary
table2.to_pandas()
```
The error generated is:
```
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
Cell In[17], line 1
----> 1 table2.to_pandas()
File
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/array.pxi:1020](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/array.pxi#line=1019),
in pyarrow.lib._PandasConvertible.to_pandas()
File
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi:5177](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi#line=5176),
in pyarrow.lib.Table._to_pandas()
File
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py:806](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py#line=805),
in table_to_dataframe(options, table, categories, ignore_metadata,
types_mapper)
803 columns = _deserialize_column_index(table, all_columns,
column_indexes)
805 column_names = table.column_names
--> 806 result = pa.lib.table_to_blocks(options, table, categories,
807 list(ext_columns_dtypes.keys()))
808 if _pandas_api.is_ge_v3():
809 from pandas.api.internals import create_dataframe_from_blocks
File
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi:4103](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi#line=4102),
in pyarrow.lib.table_to_blocks()
File
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/error.pxi:92](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/error.pxi#line=91),
in pyarrow.lib.check_status()
ArrowNotImplementedError: Not implemented type for Arrow list to pandas:
fixed_size_binary[32]
```
In contrast, recent versions of Pandas *does* support these data types if
you use the pycapsule interface for arrow.
```
# This works just fine
print(pd.DataFrame.from_records(table2))
```
Produces output like expected:
```
0
0 (b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...
```
This was tested on pyarrow 21.0.0 and pandas 2.3.3
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]