[
https://issues.apache.org/jira/browse/ARROW-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102032#comment-17102032
]
Jonathan Keane commented on ARROW-8726:
---------------------------------------
Ok, I've got the latest nightly installed (20200506) and I'm no longer getting
the segfault, though when I do filter/collect I get the following:
{code:r}
> ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
>
> ds %>%
+ filter(cyl > 8) %>%
+ collect()
Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
Must have at least one array to create a converter
{code}
I get a similar output (with the filename in the mis-specified nothing column)
when just collecting without filter:
{code:r}
> collect(ds)
# A tibble: 64 x 13
mpg cyl disp hp drat wt qsec vs am gear carb level
nothing
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
<chr>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 one
mtcars.parquet
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 one
mtcars.parquet
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 one
mtcars.parquet
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 one
mtcars.parquet
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 one
mtcars.parquet
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 one
mtcars.parquet
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 one
mtcars.parquet
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 one
mtcars.parquet
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 one
mtcars.parquet
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 one
mtcars.parquet
# … with 54 more rows
{code}
> [R][Dataset] segfault with a mis-specified partition
> ----------------------------------------------------
>
> Key: ARROW-8726
> URL: https://issues.apache.org/jira/browse/ARROW-8726
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Jonathan Keane
> Assignee: Francois Saint-Jacques
> Priority: Major
> Fix For: 1.0.0, 0.17.1
>
>
> Calling filter + collect on a dataset with a mis-specified partitioning
> causes a segfault. Though this is clearly input error, it would be nice if
> there was some guidance that something was wrong with the partitioning.
> {code:r}
> library(arrow)
> library(dplyr)
> dir.create("multi_mtcars/one", recursive = TRUE)
> dir.create("multi_mtcars/two", recursive = TRUE)
> write_parquet(mtcars, "multi_mtcars/one/mtcars.parquet")
> write_parquet(mtcars, "multi_mtcars/two/mtcars.parquet")
> ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
> # the following will segfault
> ds %>%
> filter(cyl > 8) %>%
> collect()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)