[jira] [Updated] (ARROW-18099) Cannot create pandas categorical from table only with nulls

Damian Barabonkov (Jira) Wed, 19 Oct 2022 06:36:18 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Damian Barabonkov updated ARROW-18099:
--------------------------------------
    Description: 
A pyarrow Table with only null values cannot be instantiated as a Pandas 
DataFrame with said column as a category. However, pandas does support "empty" 
categoricals. Therefore, a simple patch would be to load the pa.Table as an 
object first and convert, once in pandas, to a categorical which will be empty. 
However, that does not solve the pyarrow bug at its root.

 

Sample reproducible example
{code:java}
import pyarrow as pa
pylist = [{'x': None, '__index_level_0__': 2}, {'x': None, '__index_level_0__': 
3}]
tbl = pa.Table.from_pylist(pylist)
 

#Errors

df_broken = tbl.to_pandas(categories=["x"])
 

#Works
df_works = tbl.to_pandas()
df_works = df_works.astype({"x": "category"}) {code}

  was:
A pyarrow Table with only null values cannot be instantiated as a Pandas 
DataFrame with said column as a category. However, pandas does support "empty" 
categoricals. Therefore, a simple patch would be to load the pa.Table as an 
object first and convert, once in pandas, to a categorical which will be empty. 
However, that does not solve the pyarrow bug at its root.

 

Sample reproducible example
```python

import pyarrow as pa



pylist = [\{'x': None, '__index_level_0__': 2}, \{'x': None, 
'__index_level_0__': 3}]
tbl = pa.Table.from_pylist(pylist)

 

# Errors

df_broken = tbl.to_pandas(categories=["x"])

 

# Works
df_works = tbl.to_pandas()
df_works = df_works.astype(\{"x": "category"})

```


> Cannot create pandas categorical from table only with nulls
> -----------------------------------------------------------
>
>                 Key: ARROW-18099
>                 URL: https://issues.apache.org/jira/browse/ARROW-18099
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 9.0.0
>         Environment: OSX 12.6
> M1 silicon
>            Reporter: Damian Barabonkov
>            Priority: Minor
>
> A pyarrow Table with only null values cannot be instantiated as a Pandas 
> DataFrame with said column as a category. However, pandas does support 
> "empty" categoricals. Therefore, a simple patch would be to load the pa.Table 
> as an object first and convert, once in pandas, to a categorical which will 
> be empty. However, that does not solve the pyarrow bug at its root.
>  
> Sample reproducible example
> {code:java}
> import pyarrow as pa
> pylist = [{'x': None, '__index_level_0__': 2}, {'x': None, 
> '__index_level_0__': 3}]
> tbl = pa.Table.from_pylist(pylist)
>  
> #Errors
> df_broken = tbl.to_pandas(categories=["x"])
>  
> #Works
> df_works = tbl.to_pandas()
> df_works = df_works.astype({"x": "category"}) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18099) Cannot create pandas categorical from table only with nulls

Reply via email to