[ 
https://issues.apache.org/jira/browse/ARROW-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-10485:
------------------------------------

    Assignee: Ben Kietzman

> open_dataset(): specifying partition when hive_style =TRUE fails silently
> -------------------------------------------------------------------------
>
>                 Key: ARROW-10485
>                 URL: https://issues.apache.org/jira/browse/ARROW-10485
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 2.0.0
>         Environment: MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package 
> v2.0.0
>            Reporter: John Sheffield
>            Assignee: Ben Kietzman
>            Priority: Minor
>
> When writing a dataset with hive_style = TRUE, now the default, that dataset 
> has to be opened without an explicit definition of the partitions to work as 
> expected. Even if the correct partition is specified, any query to the 
> dataset on the partition field returns 0 rows.
>  
> From my eyes as a user, I'd want this to error out specifically (not just 
> warn), probably when first calling open_dataset().
> ```
> data("mtcars")
>  arrow::write_dataset(dataset = mtcars, path = "mtcarstest", partitioning = 
> "cyl", format = "parquet", hive_style = TRUE)
> mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl")
>  mtc2 <- arrow::open_dataset("mtcarstest")
> mtc1 %>%
>     dplyr::filter(cyl == 4) %>%
>     collect()
> mtc2 %>%
>     dplyr::filter(cyl == 4) %>%
>     collect()
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to