[ 
https://issues.apache.org/jira/browse/ARROW-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-10485:
------------------------------------
    Summary: [R] Accept partitioning in open_dataset when file paths are 
hive-style  (was: [R] open_dataset(): specifying partition when hive_style 
=TRUE fails silently)

> [R] Accept partitioning in open_dataset when file paths are hive-style
> ----------------------------------------------------------------------
>
>                 Key: ARROW-10485
>                 URL: https://issues.apache.org/jira/browse/ARROW-10485
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 2.0.0
>         Environment: MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package 
> v2.0.0
>            Reporter: John Sheffield
>            Assignee: Neal Richardson
>            Priority: Critical
>             Fix For: 7.0.0
>
>
> When writing a dataset with hive_style = TRUE, now the default, that dataset 
> has to be opened without an explicit definition of the partitions to work as 
> expected. Even if the correct partition is specified, any query to the 
> dataset on the partition field returns 0 rows.
>  
> From my eyes as a user, I'd want this to error out specifically (not just 
> warn), probably when first calling open_dataset().
> {code:r}
> data("mtcars")
> arrow::write_dataset(
>     dataset = mtcars, path = "mtcarstest", partitioning = "cyl",
>     format = "parquet", hive_style = TRUE)
> mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl")
> mtc2 <- arrow::open_dataset("mtcarstest")
> mtc1 %>%
>      dplyr::filter(cyl == 4) %>%
>      collect()
> mtc2 %>%
>      dplyr::filter(cyl == 4) %>%
>      collect()
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to