jeanetteclark opened a new issue, #34780:
URL: https://github.com/apache/arrow/issues/34780
### Describe the bug, including details regarding any error messages,
version, and platform.
Hi Arrow! I recently (with Arrow 11) have been getting a complete R crash
when running `write_dataset` on a dataset that previously had been working
fine. I haven't been able to reproduce using dummy data, so my best reprex is
below using my production dataset. If I turn threading off, everything works,
but it takes almost 10 minutes, much longer than it used to, so I'd prefer to
not turn threading off if I can.
```
library(arrow)
dest <- tempfile()
t <- getOption('timeout')
options(timeout = 600)
# 18 MB
download.file("https://portal.edirepository.org/nis/dataviewer?packageid=edi.1075.1&entityid=926f4aa8484f185b69bc1827fa67d40c",
dest)
load(dest) # ~2 GB uncompressed
options(arrow.use_threads = TRUE)
system.time(write_dataset(res_fish,
"test_data",
format = "parquet",
partitioning = "Taxa"))
options(timeout = t)
```
<details>
<summary>Session Info</summary>
```
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6
Matrix products: default
LAPACK:
/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] arrow_11.0.0.3 assertthat_0.2.1 brio_1.1.3 rappdirs_0.3.3
R6_2.5.1 lifecycle_1.0.3
[7] magrittr_2.0.3 rlang_1.1.0 cli_3.6.1 rstudioapi_0.14
testthat_3.1.7 vctrs_0.6.1
[13] tools_4.2.3 bit64_4.0.5 glue_1.6.2 purrr_1.0.1
bit_4.0.5 compiler_4.2.3
[19] tidyselect_1.2.0 EDIutils_1.0.2
```
</details>
The error I got by following the instructions from the very helpful
debugging page is:
```
> write_dataset(res_fish,
"test_data",
format = "parquet",
+ "test_data",
+ format = "parquet",
+ partitioning = "Taxa")
Process 1622 stopped
* thread #14, stop reason = EXC_BAD_ACCESS (code=2, address=0x170493fe0)
frame #0: 0x00000001b885a994
libsystem_malloc.dylib`nanov2_allocate_from_block + 8
libsystem_malloc.dylib`nanov2_allocate_from_block:
-> 0x1b885a994 <+8>: stp x28, x27, [sp, #0x20]
0x1b885a998 <+12>: stp x26, x25, [sp, #0x30]
0x1b885a99c <+16>: stp x24, x23, [sp, #0x40]
0x1b885a9a0 <+20>: stp x22, x21, [sp, #0x50]
Target 0: (R) stopped.
```
Thanks for any help - and sorry I haven't been able to get an example
together that doesn't require downloading a bunch of data
Potentially related issues are #34211 and #34539
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]