Adam Black created ARROW-16575:
----------------------------------
Summary: arrow::write_dataset() does nothing with 0 row dataframes
in R
Key: ARROW-16575
URL: https://issues.apache.org/jira/browse/ARROW-16575
Project: Apache Arrow
Issue Type: Improvement
Environment: Mac OS 12.3, R 4.1
Reporter: Adam Black
In R a dataframe can have 0 rows. It still has column names and types.
Expected behavior of arrow::write_dataset
I would expect that it would be possible to have a FileSystemDataset with zero
rows that would contain metadata about the column names and types.
arrow::write_dataset would create the FileSystemDataset metadata when given a
dataframe with zero rows.
Actual behavior
arrow::write_dataset() does nothing when passed a dataframe with zero rows.
Reproducible example using the current arrow package on CRAN
{code:java}
arrow::write_dataset(cars, here::here("cars"))
arrow::open_dataset(here::here("cars"))
#> FileSystemDataset with 1 Parquet file
#> speed: double
#> dist: double
#>
#> See $metadata for additional Schema metadata
file.exists(here::here("cars"))
#> [1] TRUE
df <- cars[cars$speed > 1000, ]
nrow(df)
#> [1] 0
arrow::write_dataset(df, here::here("df"), format = "feather")
arrow::open_dataset(here::here("df"))
#> Error: IOError: Cannot list directory
'/private/var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T/RtmpGkX0gK/reprex-17c305ed29ad5-nerdy-ram/df'.
Detail: [errno 2] No such file or directory
file.exists(here::here("df"))
#> [1] FALSE{code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)