[
https://issues.apache.org/jira/browse/ARROW-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530266#comment-17530266
]
Andrew C Thomas commented on ARROW-16423:
-----------------------------------------
Thanks Will, output below.
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_1.2.0 dplyr_1.0.8 arrow_7.0.0
loaded via a namespace (and not attached):
[1] knitr_1.38 magrittr_2.0.3 tidyselect_1.1.2 bit_4.0.4
R6_2.5.1 rlang_1.0.2
[7] fastmap_1.1.0 fansi_1.0.3 tools_4.1.3 xfun_0.30
utf8_1.2.2 DBI_1.1.2
[13] cli_3.2.0 htmltools_0.5.2 ellipsis_0.3.2 yaml_2.3.5
bit64_4.0.5 assertthat_0.2.1
[19] digest_0.6.29 tibble_3.1.6 lifecycle_1.0.1 crayon_1.5.1
purrr_0.3.4 vctrs_0.4.0
[25] glue_1.6.2 evaluate_0.15 rmarkdown_2.13 compiler_4.1.3
pillar_1.7.0 generics_0.1.2
[31] pkgconfig_2.0.3
> arrow_info()
Arrow package version: 7.0.0
Capabilities:
dataset TRUE
parquet TRUE
json TRUE
s3 TRUE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 FALSE
jemalloc FALSE
mimalloc TRUE
Arrow options():
arrow.use_threads FALSE
Memory:
Allocator mimalloc
Current 104.98 Mb
Max 333.98 Mb
Runtime:
SIMD Level avx2
Detected SIMD Level avx2
Build:
C++ Library Version 7.0.0
C++ Compiler GNU
C++ Compiler Version 8.3.0
Git ID e78424488e24a8ce9ac68d0a212da08b7fbfc6ce
> R arrow/dplyr: simple join and collect crashes session
> ------------------------------------------------------
>
> Key: ARROW-16423
> URL: https://issues.apache.org/jira/browse/ARROW-16423
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 7.0.0
> Reporter: Andrew C Thomas
> Priority: Minor
>
> Trying to do an inner join style filter on an open_dataset, and R crashes,
> but not reliably the first time. Sometimes takes a couple of tries until it
> does.
> Reprex follows.
> ------------------------------------------------------
> library (arrow)
> library (dplyr)
> library (tidyr)
> DataSet <- expand_grid (A = 1:10, B = 1:10, C = 1:10000) %>%
> group_by (A, B)
> write_dataset(DataSet, "TestBreakData")
> for (DoThisUntilItBreaks in 1:100) {
> message (DoThisUntilItBreaks)
> D2 <- open_dataset("TestBreakData") %>% inner_join (data.frame (A=1L,
> B=1:5)) %>% collect
> }
--
This message was sent by Atlassian Jira
(v8.20.7#820007)