[ 
https://issues.apache.org/jira/browse/ARROW-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124984#comment-17124984
 ] 

Francois Saint-Jacques commented on ARROW-8726:
-----------------------------------------------

[~jorisvandenbossche]

What would you like to see solved:

1. The fact that the file name is used as a partition. Should we only consider 
the directory of the base path? This ambiguity goes away with HivePartitioning 
since it won't be parsed.
2. The fact that passing an "extra" key without value generates an error. The 
other option would be to default to NullType.

> [C++][Dataset] Mis-specified DirectoryPartitioning incorrectly uses the file 
> name as value
> ------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8726
>                 URL: https://issues.apache.org/jira/browse/ARROW-8726
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Jonathan Keane
>            Assignee: Francois Saint-Jacques
>            Priority: Major
>             Fix For: 1.0.0
>
>
> Calling filter + collect on a dataset with a mis-specified partitioning 
> causes a segfault. Though this is clearly input error, it would be nice if 
> there was some guidance that something was wrong with the partitioning.
> {code:r}
> library(arrow)
> library(dplyr)
> dir.create("multi_mtcars/one", recursive = TRUE)
> dir.create("multi_mtcars/two", recursive = TRUE)
> write_parquet(mtcars, "multi_mtcars/one/mtcars.parquet")
> write_parquet(mtcars, "multi_mtcars/two/mtcars.parquet")
> ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
> # the following will segfault
> ds %>%
>   filter(cyl > 8) %>% 
>   collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to