[ https://issues.apache.org/jira/browse/ARROW-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pal updated ARROW-13694: ------------------------ Description: Hi, I encounter a fatal error with the new version of Arrow R (5.0.0) that I did not have with its older version (4.0.1). Basically, after running "open_dataset", I filter and collect the data into a dataframe; then RStudio crashes : {code:java} ds <- arrow::open_dataset(sources = "XXXX", partitioning = c("XX","YY","ZZ")) df<- ds %>% filter(year >= 2014 & year <= 2020 & type %in% c("XX", "YY") & sector == "ABC" & identifier %in% list_identifiers & type == "LE" & val == "M") %>% select(period, obs_value) %>% collect() {code} Unfortunately, I cannot reproduce the exact code neither the problem. The dataset is very large and I did not understand the precise source of the error. Eveything I know is that my R Studio crashes and that this code worked perfectly in the older version of the package. Also, please note that I disabled multithreading with : {code:java} options(arrow.use_threads = FALSE){code} was: Hi, I encounter a fatal error with the new version of Arrow R (5.0.0) that I did not have with its older version (4.0.1). Basically, after running "open_dataset", I filter and collect the data into a dataframe; then RStudio crashes : {code:java} ds <- arrow::open_dataset(sources = "XXXX", partitioning = c("XX","YY","ZZ")) df<- ds %>% filter(year >= 2014 & year <= 2020 & type %in% c("XX", "YY") & sector == "ABC" & identifier %in% list_identifiers & type == "LE" & val == "M") %>% select(period, obs_value) %>% collect() {code} Unfortunately, I cannot reproduce the exact code neither the problem. The dataset is very large and I did not understand the precise source of the error. Eveything I know is that my R Studio crashes and that his code worked perfectly in the older version of the package. Also, please note that I disabled multithreading with : {code:java} options(arrow.use_threads = FALSE){code} > [R] Arrow filter crashes (R aborted session) > -------------------------------------------- > > Key: ARROW-13694 > URL: https://issues.apache.org/jira/browse/ARROW-13694 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 5.0.0 > Environment: RStudio Version > -------------------------------------------------- > 1.4.1103 > Session Information > -------------------------------------------------- > R version 4.0.4 (2021-02-15) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 18363) > Matrix products: default > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > base > other attached packages: > [1] readxl_1.3.1 RJDBC_0.2-8 rJava_1.0-4 > tibbletime_0.1.6 arrow_4.0.0.1 > [6] rdbnomics_0.6.4 rstudioapi_0.13 scales_1.1.1 > tidyquant_1.0.3 quantmod_0.4.18 > [11] TTR_0.24.2 PerformanceAnalytics_2.0.4 xts_0.12.1 > zoo_1.8-9 skimr_2.1.3 > [16] janitor_2.1.0 DBI_1.1.1 R.utils_2.10.1 > R.oo_1.24.0 R.methodsS3_1.8.1 > [21] devtools_2.4.2 usethis_2.0.1 R.cache_0.15.0 > rmarkdown_2.10 kableExtra_1.3.4 > [26] knitr_1.33 plotly_4.9.4.1 RColorBrewer_1.1-2 > ggpubr_0.4.0 ggrepel_0.9.1 > [31] ggExtra_0.9 haven_2.4.3 sas7bdat_0.5 > data.table_1.14.0 lubridate_1.7.10 > [36] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 > purrr_0.3.4 readr_2.0.1 > [41] tidyr_1.1.3 tibble_3.1.3 ggplot2_3.3.5 > tidyverse_1.3.1 > loaded via a namespace (and not attached): > [1] colorspace_2.0-2 ggsignif_0.6.2 ellipsis_0.3.2 rio_0.5.27 > rprojroot_2.0.2 snakecase_0.11.0 base64enc_0.1-3 fs_1.5.0 > [9] remotes_2.4.0 bit64_4.0.5 fansi_0.5.0 xml2_1.3.2 > cachem_1.0.5 pkgload_1.2.1 jsonlite_1.7.2 broom_0.7.9 > [17] dbplyr_2.1.1 shiny_1.6.0 compiler_4.0.4 httr_1.4.2 > backports_1.2.1 assertthat_0.2.1 fastmap_1.1.0 lazyeval_0.2.2 > [25] cli_3.0.1 later_1.2.0 htmltools_0.5.1.1 prettyunits_1.1.1 > tools_4.0.4 gtable_0.3.0 glue_1.4.2 Rcpp_1.0.7 > [33] carData_3.0-4 cellranger_1.1.0 vctrs_0.3.8 svglite_2.0.0 > xfun_0.25 ps_1.6.0 openxlsx_4.2.4 testthat_3.0.4 > [41] rvest_1.0.1 mime_0.11 miniUI_0.1.1.1 lifecycle_1.0.0 > rstatix_0.7.0 hms_1.1.0 promises_1.2.0.1 curl_4.3.2 > [49] memoise_2.0.0 stringi_1.7.3 desc_1.3.0 pkgbuild_1.2.0 > zip_2.2.0 repr_1.1.3 rlang_0.4.11 pkgconfig_2.0.3 > [57] systemfonts_1.0.2 lattice_0.20-41 evaluate_0.14 htmlwidgets_1.5.3 > bit_4.0.4 tidyselect_1.1.1 processx_3.5.2 magrittr_2.0.1 > [65] R6_2.5.1 generics_0.1.0 pillar_1.6.2 foreign_0.8-81 > withr_2.4.2 abind_1.4-5 modelr_0.1.8 crayon_1.4.1 > [73] car_3.0-11 Quandl_2.11.0 utf8_1.2.2 tzdb_0.1.2 > callr_3.7.0 reprex_2.0.1 digest_0.6.27 webshot_0.5.2 > [81] xtable_1.8-4 httpuv_1.6.1 munsell_0.5.0 viridisLite_0.4.0 > quadprog_1.5-8 sessioninfo_1.1.1 > System Information > -------------------------------------------------- > sysname : Windows > release : 10 x64 > version : build 18363 > machine : x86-64 > Platform Information > -------------------------------------------------- > OS.type : windows > file.sep : / > dynlib.ext : .dll > GUI : RStudio > endian : little > pkgType : win.binary > path.sep : ; > r_arch : x64 > R Version > -------------------------------------------------- > platform : x86_64-w64-mingw32 > arch : x86_64 > os : mingw32 > system : x86_64, mingw32 > status : > major : 4 > minor : 0.4 > year : 2021 > month : 02 > day : 15 > svn rev : 80002 > language : R > version.string : R version 4.0.4 (2021-02-15) > nickname : Lost Library Book > Reporter: Pal > Priority: Blocker > Fix For: 5.0.1 > > > Hi, > > I encounter a fatal error with the new version of Arrow R (5.0.0) that I did > not have with its older version (4.0.1). Basically, after running > "open_dataset", I filter and collect the data into a dataframe; then RStudio > crashes : > > {code:java} > ds <- arrow::open_dataset(sources = "XXXX", partitioning = c("XX","YY","ZZ")) > df<- ds %>% > filter(year >= 2014 & year <= 2020 & type %in% c("XX", "YY") & sector == > "ABC" & identifier %in% list_identifiers & type == "LE" & val == "M") %>% > select(period, obs_value) %>% > collect() > {code} > > Unfortunately, I cannot reproduce the exact code neither the problem. The > dataset is very large and I did not understand the precise source of the > error. Eveything I know is that my R Studio crashes and that this code worked > perfectly in the older version of the package. > Also, please note that I disabled multithreading with : > {code:java} > options(arrow.use_threads = FALSE){code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)