[
https://issues.apache.org/jira/browse/ARROW-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429000#comment-17429000
]
Neal Richardson commented on ARROW-14321:
-----------------------------------------
Turns out this is in the same family as ARROW-13761 and converting
ChunkedArrays with 0 chunks to R. That was reported about timestamp type, and
[~westonpace] added a test that covers most types, but apparently not
dictionary. Adding dictionary type to the list reproduces the segfault:
{code}
diff --git a/r/tests/testthat/test-chunked-array.R
b/r/tests/testthat/test-chunked-array.R
index 3be65f88f..0fa9ab656 100644
--- a/r/tests/testthat/test-chunked-array.R
+++ b/r/tests/testthat/test-chunked-array.R
@@ -206,7 +206,7 @@ test_that("ChunkedArray supports empty arrays
(ARROW-13761)", {
int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),
uint64(), float32(), float64(), timestamp("ns"), binary(),
large_binary(), fixed_size_binary(32), date32(), date64(),
- decimal(4, 2)
+ decimal(4, 2), dictionary()
)
empty_filter <- ChunkedArray$create(type = bool())
{code}
> [C++] segfault in OrderBySinkNode when filtering to 0 rows with dictionary
> type
> -------------------------------------------------------------------------------
>
> Key: ARROW-14321
> URL: https://issues.apache.org/jira/browse/ARROW-14321
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Jonathan Keane
> Assignee: Alexander Ocsa
> Priority: Major
> Labels: query-engine
> Fix For: 6.0.0
>
>
> It appears to happen when one of the filter parts has no matching rows:
> {code:r}
> library(arrow)
> library(dplyr)
> first_date <- lubridate::ymd_hms("2015-04-29 03:12:39")
> df1 <- tibble::tibble(
> int = 1:10,
> dbl = as.numeric(1:10),
> lgl = rep(c(TRUE, FALSE, NA, TRUE, FALSE), 2),
> chr = letters[1:10],
> fct = factor(LETTERS[1:10]),
> ts = first_date + lubridate::days(1:10)
> )
> ds <- InMemoryDataset$create(df1)
> # works
> ds %>%
> filter(int < 8) %>%
> arrange(dbl) %>%
> collect()
> # segfaults
> ds %>%
> filter(int < 8, int > 55) %>%
> arrange(dbl) %>%
> collect()
> segfaults
> ds %>%
> filter(int < 0) %>%
> arrange(dbl) %>%
> collect()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)