[ 
https://issues.apache.org/jira/browse/ARROW-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530266#comment-17530266
 ] 

Andrew C Thomas commented on ARROW-16423:
-----------------------------------------

Thanks Will, output below.

 

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252   
 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.2.0 dplyr_1.0.8 arrow_7.0.0

loaded via a namespace (and not attached):
 [1] knitr_1.38       magrittr_2.0.3   tidyselect_1.1.2 bit_4.0.4        
R6_2.5.1         rlang_1.0.2     
 [7] fastmap_1.1.0    fansi_1.0.3      tools_4.1.3      xfun_0.30        
utf8_1.2.2       DBI_1.1.2       
[13] cli_3.2.0        htmltools_0.5.2  ellipsis_0.3.2   yaml_2.3.5       
bit64_4.0.5      assertthat_0.2.1
[19] digest_0.6.29    tibble_3.1.6     lifecycle_1.0.1  crayon_1.5.1     
purrr_0.3.4      vctrs_0.4.0     
[25] glue_1.6.2       evaluate_0.15    rmarkdown_2.13   compiler_4.1.3   
pillar_1.7.0     generics_0.1.2  
[31] pkgconfig_2.0.3 
> arrow_info()
Arrow package version: 7.0.0

Capabilities:
               
dataset    TRUE
parquet    TRUE
json       TRUE
s3         TRUE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2       FALSE
jemalloc  FALSE
mimalloc   TRUE

Arrow options():
                       
arrow.use_threads FALSE

Memory:
                   
Allocator  mimalloc
Current   104.98 Mb
Max       333.98 Mb

Runtime:
                        
SIMD Level          avx2
Detected SIMD Level avx2

Build:
                                                             
C++ Library Version                                     7.0.0
C++ Compiler                                              GNU
C++ Compiler Version                                    8.3.0
Git ID               e78424488e24a8ce9ac68d0a212da08b7fbfc6ce

> R arrow/dplyr: simple join and collect crashes session
> ------------------------------------------------------
>
>                 Key: ARROW-16423
>                 URL: https://issues.apache.org/jira/browse/ARROW-16423
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 7.0.0
>            Reporter: Andrew C Thomas
>            Priority: Minor
>
> Trying to do an inner join style filter on an open_dataset, and R crashes, 
> but not reliably the first time. Sometimes takes a couple of tries until it 
> does.
> Reprex follows.
> ------------------------------------------------------
> library (arrow)
> library (dplyr)
> library (tidyr)
> DataSet <- expand_grid (A = 1:10, B = 1:10, C = 1:10000) %>%
>   group_by (A, B)
> write_dataset(DataSet, "TestBreakData")
> for (DoThisUntilItBreaks in 1:100) {
>   message (DoThisUntilItBreaks)
>   D2 <- open_dataset("TestBreakData") %>% inner_join (data.frame (A=1L, 
> B=1:5)) %>% collect
> }



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to