paleolimbot opened a new pull request, #37565:
URL: https://github.com/apache/arrow/pull/37565

   ### Rationale for this change
   
   The `gc_memory_pool()` is the one we use almost everywhere in the R package. 
It uses a special allocation mechanism that calls into R to run the garbage 
collector after a failed allocation (in case there are any large objects that 
can be removed). In the case where an allocation happens on another thread 
(most of the time when running exec plans), the call into R may cause a crash: 
even though the memory pool was ensuring serialized access using a mutex, this 
is not sufficient for R (for reasons I don't understand).
   
   ### What changes are included in this PR?
   
   Use `SafeCallIntoR()` to run the garbage collector instead. This ensures 
that the calling thread is used for any call into R (or errors if this is not 
possible).
   
   ### Are these changes tested?
   
   Yes: there is an existing test that ensures this code path occurs at least 
once.
   
   ### Are there any user-facing changes?
   
   No.
   
   Before, the following code would crash the R session every time. After this 
PR, I cannot reproduce a crash:
   
   ```r
   library(arrow, warn.conflicts = FALSE)
   for(i in 1:100) {
     open_dataset("~/Desktop/nyc-taxi/") |>
       head()
   }
   ```
   
   See https://github.com/apache/arrow/issues/37513#issuecomment-1702756063 for 
how to get `nyc-taxi`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to