[
https://issues.apache.org/jira/browse/ARROW-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson updated ARROW-10485:
------------------------------------
Summary: [R] Accept partitioning in open_dataset when file paths are
hive-style (was: [R] open_dataset(): specifying partition when hive_style
=TRUE fails silently)
> [R] Accept partitioning in open_dataset when file paths are hive-style
> ----------------------------------------------------------------------
>
> Key: ARROW-10485
> URL: https://issues.apache.org/jira/browse/ARROW-10485
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 2.0.0
> Environment: MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package
> v2.0.0
> Reporter: John Sheffield
> Assignee: Neal Richardson
> Priority: Critical
> Fix For: 7.0.0
>
>
> When writing a dataset with hive_style = TRUE, now the default, that dataset
> has to be opened without an explicit definition of the partitions to work as
> expected. Even if the correct partition is specified, any query to the
> dataset on the partition field returns 0 rows.
>
> From my eyes as a user, I'd want this to error out specifically (not just
> warn), probably when first calling open_dataset().
> {code:r}
> data("mtcars")
> arrow::write_dataset(
> dataset = mtcars, path = "mtcarstest", partitioning = "cyl",
> format = "parquet", hive_style = TRUE)
> mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl")
> mtc2 <- arrow::open_dataset("mtcarstest")
> mtc1 %>%
> dplyr::filter(cyl == 4) %>%
> collect()
> mtc2 %>%
> dplyr::filter(cyl == 4) %>%
> collect()
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)