paleolimbot commented on issue #14474:
URL: https://github.com/apache/arrow/issues/14474#issuecomment-1376156346

   I'm going to try to rig a solution for this for the upcoming release, since 
we have a lot of open issues about this one 
([ARROW-18313](https://issues.apache.org/jira/browse/ARROW-18313), 
[ARROW-17208](https://issues.apache.org/jira/browse/ARROW-17208), 
[ARROW-17002](https://issues.apache.org/jira/browse/ARROW-17002), 
[ARROW-16421](https://issues.apache.org/jira/browse/ARROW-16421), 
[ARROW-16452](https://issues.apache.org/jira/browse/ARROW-16452).
   
   We can discuss on the PR, but basically, we create many temporary R6 objects 
in the process of creating an ExecPlan. Those R objects keep shared pointers 
alive until the garbage collector runs. There are some cases where we can clean 
up some of those references by resetting the shared pointer when the function 
exists (which is predictable) rather than when the garbage collector runs 
(which is not). In the case of a `dplyr::collect()` we don't surface *any* R6 
objects to the user so there shouldn't be any need for any lingering shared_ptr 
references to exist (at least because of R).
   
   I'd propose that we add a `$unsafe_delete()` method to `ArrowObject` - or at 
least to a few types of objects - and see to what extent cleaning up those 
temporary references can avoid open files by the time `collect()` returns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to