[jira] [Commented] (ARROW-14321) [C++] segfault in OrderBySinkNode when filtering to 0 rows with dictionary type

Neal Richardson (Jira) Thu, 14 Oct 2021 13:14:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429000#comment-17429000
 ]


Neal Richardson commented on ARROW-14321:
-----------------------------------------

Turns out this is in the same family as ARROW-13761 and converting 
ChunkedArrays with 0 chunks to R. That was reported about timestamp type, and 
[~westonpace] added a test that covers most types, but apparently not 
dictionary. Adding dictionary type to the list reproduces the segfault:

{code}
diff --git a/r/tests/testthat/test-chunked-array.R 
b/r/tests/testthat/test-chunked-array.R
index 3be65f88f..0fa9ab656 100644
--- a/r/tests/testthat/test-chunked-array.R
+++ b/r/tests/testthat/test-chunked-array.R
@@ -206,7 +206,7 @@ test_that("ChunkedArray supports empty arrays 
(ARROW-13761)", {
     int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),
     uint64(), float32(), float64(), timestamp("ns"), binary(),
     large_binary(), fixed_size_binary(32), date32(), date64(),
-    decimal(4, 2)
+    decimal(4, 2), dictionary()
   )
 
   empty_filter <- ChunkedArray$create(type = bool())
{code}

> [C++] segfault in OrderBySinkNode when filtering to 0 rows with dictionary 
> type
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-14321
>                 URL: https://issues.apache.org/jira/browse/ARROW-14321
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Jonathan Keane
>            Assignee: Alexander Ocsa
>            Priority: Major
>              Labels: query-engine
>             Fix For: 6.0.0
>
>
> It appears to happen when one of the filter parts has no matching rows:
> {code:r}
> library(arrow)
> library(dplyr)
> first_date <- lubridate::ymd_hms("2015-04-29 03:12:39")
> df1 <- tibble::tibble(
>   int = 1:10,
>   dbl = as.numeric(1:10),
>   lgl = rep(c(TRUE, FALSE, NA, TRUE, FALSE), 2),
>   chr = letters[1:10],
>   fct = factor(LETTERS[1:10]),
>   ts = first_date + lubridate::days(1:10)
> )
> ds <- InMemoryDataset$create(df1)
> # works
> ds %>% 
>   filter(int < 8) %>%
>   arrange(dbl) %>%
>   collect()
> # segfaults
> ds %>% 
>   filter(int < 8, int > 55) %>%
>   arrange(dbl) %>%
>   collect()
>  segfaults
> ds %>% 
>   filter(int < 0) %>%
>   arrange(dbl) %>%
>   collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-14321) [C++] segfault in OrderBySinkNode when filtering to 0 rows with dictionary type

Reply via email to