jeanetteclark opened a new issue, #34780:
URL: https://github.com/apache/arrow/issues/34780

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hi Arrow! I recently (with Arrow 11) have been getting a complete R crash 
when running `write_dataset` on a dataset that previously had been working 
fine. I haven't been able to reproduce using dummy data, so my best reprex is 
below using my production dataset. If I turn threading off, everything works, 
but it takes almost 10 minutes, much longer than it used to, so I'd prefer to 
not turn threading off if I can.
   
   ```
   library(arrow)
   
   dest <- tempfile()
   t <- getOption('timeout')
   options(timeout = 600)
   
   # 18 MB
   
download.file("https://portal.edirepository.org/nis/dataviewer?packageid=edi.1075.1&entityid=926f4aa8484f185b69bc1827fa67d40c";,
                 dest)
   
   load(dest) # ~2 GB uncompressed
   
   options(arrow.use_threads = TRUE)
   
   system.time(write_dataset(res_fish,
                 "test_data",
                 format = "parquet",
                 partitioning = "Taxa"))
   
   options(timeout = t)
   ```
   
   <details>
     <summary>Session Info</summary>
   
     ```
   R version 4.2.3 (2023-03-15)
   Platform: aarch64-apple-darwin20 (64-bit)
   Running under: macOS Monterey 12.6
   
   Matrix products: default
   LAPACK: 
/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
   
   locale:
   [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
   
   attached base packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base     
   
   loaded via a namespace (and not attached):
    [1] arrow_11.0.0.3   assertthat_0.2.1 brio_1.1.3       rappdirs_0.3.3   
R6_2.5.1         lifecycle_1.0.3 
    [7] magrittr_2.0.3   rlang_1.1.0      cli_3.6.1        rstudioapi_0.14  
testthat_3.1.7   vctrs_0.6.1     
   [13] tools_4.2.3      bit64_4.0.5      glue_1.6.2       purrr_1.0.1      
bit_4.0.5        compiler_4.2.3  
   [19] tidyselect_1.2.0 EDIutils_1.0.2  
     ```
   </details>
   
   The error I got by following the instructions from the very helpful 
debugging page is:
   
   ```
   > write_dataset(res_fish,
                 "test_data",
                 format = "parquet", 
         +               "test_data",
   +               format = "parquet", 
   +               partitioning = "Taxa")
   Process 1622 stopped
   * thread #14, stop reason = EXC_BAD_ACCESS (code=2, address=0x170493fe0)
       frame #0: 0x00000001b885a994 
libsystem_malloc.dylib`nanov2_allocate_from_block + 8
   libsystem_malloc.dylib`nanov2_allocate_from_block:
   ->  0x1b885a994 <+8>:  stp    x28, x27, [sp, #0x20]
       0x1b885a998 <+12>: stp    x26, x25, [sp, #0x30]
       0x1b885a99c <+16>: stp    x24, x23, [sp, #0x40]
       0x1b885a9a0 <+20>: stp    x22, x21, [sp, #0x50]
   Target 0: (R) stopped.
   
   ```
   
   Thanks for any help - and sorry I haven't been able to get an example 
together that doesn't require downloading a bunch of data
   
   Potentially related issues are #34211 and #34539
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to