timsaucer opened a new issue, #47675:
URL: https://github.com/apache/arrow/issues/47675

   ### Describe the enhancement requested
   
   When you attempt to create a pandas dataframe from a pyarrow table that 
contains a list of fixed sized bytes, it generates the error:
   
   ```
   ArrowNotImplementedError: Not implemented type for Arrow list to pandas: 
fixed_size_binary[32]
   ```
   
   Here is a minimal reproducible example:
   
   ```
   import pyarrow as pa
   import pandas as pd
   
   # Create some sample 32-byte binary data
   data = [
       b'\x00' * 32,  # All zeros
       b'\xff' * 32,  # All ones
       b'\x01\x02\x03\x04' * 8,  # Repeating pattern
       bytes(range(32)),  # Sequential bytes 0-31
       b'Hello World!' + b'\x00' * 20,  # Text padded with zeros
   ]
   
   # Create a table with a fixed-size binary column
   table = pa.table({
       'id': [1, 2, 3, 4, 5],
       'fixed_binary': pa.array(data, type=pa.binary(32))
   })
   
   # This works fine - when you have only a fixed size binary
   print(table.to_pandas())
   
   data_as_array = pa.array(data, type=pa.binary(32))
   
   table2 = pa.table({
       "list_fixed_binary": pa.array([data_as_array], 
type=pa.list_(pa.binary(32)))
   })
   
   # This fails - when you have a list of fixed sized binary
   table2.to_pandas()
   ```
   
   The error generated is:
   
   ```
   ---------------------------------------------------------------------------
   ArrowNotImplementedError                  Traceback (most recent call last)
   Cell In[17], line 1
   ----> 1 table2.to_pandas()
   
   File 
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/array.pxi:1020](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/array.pxi#line=1019),
 in pyarrow.lib._PandasConvertible.to_pandas()
   
   File 
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi:5177](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi#line=5176),
 in pyarrow.lib.Table._to_pandas()
   
   File 
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py:806](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/pandas_compat.py#line=805),
 in table_to_dataframe(options, table, categories, ignore_metadata, 
types_mapper)
       803 columns = _deserialize_column_index(table, all_columns, 
column_indexes)
       805 column_names = table.column_names
   --> 806 result = pa.lib.table_to_blocks(options, table, categories,
       807                                 list(ext_columns_dtypes.keys()))
       808 if _pandas_api.is_ge_v3():
       809     from pandas.api.internals import create_dataframe_from_blocks
   
   File 
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi:4103](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/table.pxi#line=4102),
 in pyarrow.lib.table_to_blocks()
   
   File 
[~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/error.pxi:92](http://localhost:8888/lab/workspaces/auto-w/tree/~/working/tmp/.venv/lib/python3.12/site-packages/pyarrow/error.pxi#line=91),
 in pyarrow.lib.check_status()
   
   ArrowNotImplementedError: Not implemented type for Arrow list to pandas: 
fixed_size_binary[32]
   ```
   
   In contrast, recent versions of Pandas *does* support these data types if 
you use the pycapsule interface for arrow.
   
   ```
   # This works just fine
   print(pd.DataFrame.from_records(table2))
   ```
   
   Produces output like expected:
   ```
                                                      0
   0  (b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...
   ```
   
   This was tested on pyarrow 21.0.0 and pandas 2.3.3
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to