[ 
https://issues.apache.org/jira/browse/ARROW-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496358#comment-17496358
 ] 

Will Jones commented on ARROW-15730:
------------------------------------

Yes I think I can reproduce this. Essentially, R and Arrow report freeing that 
memory, but the OS is reporting that memory as still used. I think this is 
actually expected behavior for the underlying memory pools; they tend to not 
release memory very aggressively with the expectation that it will reuse it.

If you instead use the system allocator, you should see this issue go away (I 
did when I tested on my local):

{code:R}
Sys.setenv(ARROW_DEFAULT_MEMORY_POOL="system") # You must run this before 
library(arrow)
library(arrow)
arrow_info()$memory$backend_name
# [1] "system"
{code}

However, depending on your application you probably don't want to do this. 
While it may appear to use more memory, it's not necessarily the case that the 
way the allocator is handling memory is worse. See these discussions:

 * [https://github.com/microsoft/mimalloc/issues/393#issuecomment-828707830]
 * https://issues.apache.org/jira/browse/ARROW-14790?focusedCommentId=17447365

To pull out one quote from one of the maintainers of mimalloc:

{quote}
However, generally mimalloc will only hold on to virtual memory and will return 
physical memory to the OS. Now, generally mimalloc flags unused memory as 
available to the OS and the OS will use that memory when there is memory 
pressure (MEM_RESET on windows, MADV_FREE on Linux) -- however, the OS does not 
always show that memory as available (even though it is) as it is only 
reclaimed under memory pressure.
{quote}

There is some possibility that there is a bug in the mimalloc version we are 
using (1.7.3), but the next release (2.0.x) is still in alpha: 
[https://github.com/microsoft/mimalloc/issues/383]

> [R] Memory usage in R blows up
> ------------------------------
>
>                 Key: ARROW-15730
>                 URL: https://issues.apache.org/jira/browse/ARROW-15730
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Christian
>            Assignee: Will Jones
>            Priority: Major
>             Fix For: 6.0.1
>
>         Attachments: image-2022-02-19-09-05-32-278.png
>
>
> Hi,
> I'm trying to load a ~10gb arrow file into R (under Windows)
> _(The file is generated in the 6.0.1 arrow version under Linux)._
> For whatever reason the memory usage blows up to ~110-120gb (in a fresh and 
> empty R instance).
> The weird thing is that when deleting the object again and running a gc() the 
> memory usage goes down to 90gb only. The delta of ~20-30gb is what I would 
> have expected the dataframe to use up in memory (and that's also approx. what 
> was used - in total during the load - when running the old arrow version of 
> 0.15.1. And it is also what R shows me when just printing the object size.)
> The commands I'm running are simply:
> options(arrow.use_threads=FALSE);
> arrow::set_cpu_count(1); # need this - otherwise it freezes under windows
> arrow::read_arrow('file.arrow5')
> Is arrow reserving some resources in the background and not giving them up 
> again? Are there some settings I need to change for this?
> Is this something that is known and fixed in a newer version?
> *Note* that this doesn't happen in Linux. There all the resources are freed 
> up when calling the gc() function - not sure if it matters but there I also 
> don't need to set the cpu count to 1.
> Any help would be appreciated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to