Lennart Tuijnder created ARROW-14434:
----------------------------------------

             Summary: R crashes when making an empty selection for Datasets 
with DateTime
                 Key: ARROW-14434
                 URL: https://issues.apache.org/jira/browse/ARROW-14434
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet, R
    Affects Versions: 5.0.0
         Environment: OS = Ubuntu 20.04 
I use Architect IDE (an ide base on eclipse). But the crash also happens with 
just R console. R = 3.6.3

See attached files for session info output and an R crash report. 
            Reporter: Lennart Tuijnder
         Attachments: RConsole.txt, sessionInfoOutput.txt

R (3.6.3) crashes when querying a dataset using the "?arrow:: Dataset" 
functionality when the following conditions are met:
 * The dataset to query contains a data-time/time column
 * An empty selection is made with dplyr::filter on the Dataset object
 * the dplyr::collection method is called. -> (at this point the crash happens)

This crash happens both when the dataset is locally defined or situated on an 
S3 bucket.

Here is a minimal example to reproduce the bug:
{code:java}
library(dplyr)
library(lubridate)

# If you remove the dataTime column no crashing occurs.
df <- tibble(
        time = seq(5,10,length.out = 10000),
        dateTime = as_datetime(1511870400) + time # dataTime columns causes 
crash!
)
file <- tempdir()
arrow::write_dataset(df, file)testdf <- arrow::open_dataset(file) %>%
        # filter(time > 5 & time <6) %>% # When selecting non-empty it does not 
crash
        filter(time < 5 ) %>% # select empty and it crashes!
        collect()# it crashes when you do collect()

{code}
R crashes with the following message:

**** caught segfault ****
*address 0x8, cause 'memory not mapped'*

I have included in the attachment the full R console output when running the 
above code.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to