[ 
https://issues.apache.org/jira/browse/ARROW-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431118#comment-17431118
 ] 

Reinier van Linschoten commented on ARROW-14344:
------------------------------------------------

I have done some more diagnostics, and I think the problem lies in empty 
pd.DataFrame with columns that have dtype "category".
 See the code below:
{code:python}
import pandas as pd

columns = ['record_id', 'institute', 'survey_name', 'survey_instance_id', 
'created_on', 'sent_on', 'progress', 'completed_on', 'package_id', 'archived']

# Simple example, works
empty_df = pd.DataFrame(columns=columns)
empty_df.reset_index(drop=True).to_feather(
    "empty_df.feather",
    compression="uncompressed",
)

# Category dtypes, don't work
cat_df = pd.DataFrame(columns=columns).astype("category")
cat_df.reset_index(drop=True).to_feather(
    "cat_df.feather",
    compression="uncompressed",
)

# Int32 dtypes, work
int_df = pd.DataFrame(columns=columns).astype("int32")
int_df.reset_index(drop=True).to_feather(
    "int_df.feather",
    compression="uncompressed",
)
{code}
Then we can try to import it in R:
{code:r}
empty_df <- arrow::read_feather("empty_df.feather") # Works
int_df <- arrow::read_feather("int_df.feather") # Works
cat_df <- arrow::read_feather("cat_df.feather") # Crashes
{code}

> [R][Python] Crash when reading empty .feather file
> --------------------------------------------------
>
>                 Key: ARROW-14344
>                 URL: https://issues.apache.org/jira/browse/ARROW-14344
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>         Environment: Ubuntu Server 20.04.3, arrow (R) 5.0.02, pyarrow 3.0.0 
> (Python), RStudio 1.4.1717, R 4.1.0
>            Reporter: Reinier van Linschoten
>            Priority: Major
>              Labels: R, arrow, bug, error, pandas, python
>
> I get an R Session Error in RStudio Server when I try to read an empty 
> .feather file.
> Error: The previous R session was abnormally terminated due to an unexpected 
> crash. You may have lost workspace data as a result of this crash. 
> Reproduce:
>  * Create empty pandas dataframe in Python
>  * Write to .feather file with .reset_index(drop=True) and 
> compression="uncompressed"
>  * Try to read data in RStudio with arrow::read_feather(path)
>  * Error
> I can read dataframes with one or more rows in RStudio.
> I can read the empty dataframe with pandas.read_feather(). This returns an 
> empty pandas dataframe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to