John Sheffield created ARROW-10485:
--------------------------------------
Summary: open_dataset(): specifying partition when hive_style
=TRUE fails silently
Key: ARROW-10485
URL: https://issues.apache.org/jira/browse/ARROW-10485
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 2.0.0
Environment: MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package
v2.0.0
Reporter: John Sheffield
When writing a dataset with hive_style = TRUE, now the default, that dataset
has to be opened without an explicit definition of the partitions to work as
expected. Even if the correct partition is specified, any query to the dataset
on the partition field returns 0 rows.
>From my eyes as a user, I'd want this to error out specifically (not just
>warn), probably when first calling open_dataset().
```
data("mtcars")
arrow::write_dataset(dataset = mtcars, path = "mtcarstest",
partitioning = "cyl", format = "parquet",
hive_style = TRUE)
mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl")
mtc2 <- arrow::open_dataset("mtcarstest")
mtc1 %>%
dplyr::filter(cyl == 4) %>%
collect()
mtc2 %>%
dplyr::filter(cyl == 4) %>%
collect()
```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)