paleolimbot opened a new pull request, #37565:
URL: https://github.com/apache/arrow/pull/37565
### Rationale for this change
The `gc_memory_pool()` is the one we use almost everywhere in the R package.
It uses a special allocation mechanism that calls into R to run the garbage
collector after a failed allocation (in case there are any large objects that
can be removed). In the case where an allocation happens on another thread
(most of the time when running exec plans), the call into R may cause a crash:
even though the memory pool was ensuring serialized access using a mutex, this
is not sufficient for R (for reasons I don't understand).
### What changes are included in this PR?
Use `SafeCallIntoR()` to run the garbage collector instead. This ensures
that the calling thread is used for any call into R (or errors if this is not
possible).
### Are these changes tested?
Yes: there is an existing test that ensures this code path occurs at least
once.
### Are there any user-facing changes?
No.
Before, the following code would crash the R session every time. After this
PR, I cannot reproduce a crash:
```r
library(arrow, warn.conflicts = FALSE)
for(i in 1:100) {
open_dataset("~/Desktop/nyc-taxi/") |>
head()
}
```
See https://github.com/apache/arrow/issues/37513#issuecomment-1702756063 for
how to get `nyc-taxi`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]