[ 
https://issues.apache.org/jira/browse/ARROW-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102032#comment-17102032
 ] 

Jonathan Keane commented on ARROW-8726:
---------------------------------------

Ok, I've got the latest nightly installed (20200506) and I'm no longer getting 
the segfault, though when I do filter/collect I get the following:

{code:r}
> ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
> 
> ds %>%
+   filter(cyl > 8) %>% 
+   collect()
Error in Table__to_dataframe(x, use_threads = option_use_threads()) : 
  Must have at least one array to create a converter
{code}

I get a similar output (with the filename in the mis-specified nothing column) 
when just collecting without filter:
{code:r}
> collect(ds)
# A tibble: 64 x 13
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb level 
nothing       
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> 
<chr>         
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 one   
mtcars.parquet
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 one   
mtcars.parquet
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 one   
mtcars.parquet
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 one   
mtcars.parquet
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 one   
mtcars.parquet
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 one   
mtcars.parquet
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 one   
mtcars.parquet
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 one   
mtcars.parquet
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 one   
mtcars.parquet
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 one   
mtcars.parquet
# … with 54 more rows
{code}

> [R][Dataset] segfault with a mis-specified partition
> ----------------------------------------------------
>
>                 Key: ARROW-8726
>                 URL: https://issues.apache.org/jira/browse/ARROW-8726
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Jonathan Keane
>            Assignee: Francois Saint-Jacques
>            Priority: Major
>             Fix For: 1.0.0, 0.17.1
>
>
> Calling filter + collect on a dataset with a mis-specified partitioning 
> causes a segfault. Though this is clearly input error, it would be nice if 
> there was some guidance that something was wrong with the partitioning.
> {code:r}
> library(arrow)
> library(dplyr)
> dir.create("multi_mtcars/one", recursive = TRUE)
> dir.create("multi_mtcars/two", recursive = TRUE)
> write_parquet(mtcars, "multi_mtcars/one/mtcars.parquet")
> write_parquet(mtcars, "multi_mtcars/two/mtcars.parquet")
> ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
> # the following will segfault
> ds %>%
>   filter(cyl > 8) %>% 
>   collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to