Damian Barabonkov created ARROW-18099:
-----------------------------------------

             Summary: Cannot create pandas categorical from table only with 
nulls
                 Key: ARROW-18099
                 URL: https://issues.apache.org/jira/browse/ARROW-18099
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 9.0.0
         Environment: OSX 12.6
M1 silicon
            Reporter: Damian Barabonkov


A pyarrow Table with only null values cannot be instantiated as a Pandas 
DataFrame with said column as a category. However, pandas does support "empty" 
categoricals. Therefore, a simple patch would be to load the pa.Table as an 
object first and convert, once in pandas, to a categorical which will be empty. 
However, that does not solve the pyarrow bug at its root.

 

Sample reproducible example
```python

import pyarrow as pa



pylist = [\{'x': None, '__index_level_0__': 2}, \{'x': None, 
'__index_level_0__': 3}]
tbl = pa.Table.from_pylist(pylist)

 

# Errors

df_broken = tbl.to_pandas(categories=["x"])

 

# Works
df_works = tbl.to_pandas()
df_works = df_works.astype(\{"x": "category"})

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to