[
https://issues.apache.org/jira/browse/ARROW-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496413#comment-17496413
]
Christian edited comment on ARROW-15730 at 2/23/22, 12:55 AM:
--------------------------------------------------------------
Three additional questions:
* Did the memory model (i.e. keeping a copy within arrow) change after 0.15 or
was it introduced afterwards? That was the previous version I was using and I
never had these kind of memory "issues" with it (understood that issues is not
necessarily the right word). The double counting just seems very punitive.
I just tried "system" and it does free it up (as you said) but for a while R is
using about 70gb when the actual object size within R is just 30gb.
* Do you know if factors that are written from (and then show up as dictionary
in the table schema) are especially punitive compared to strings?
* Were you able to set Sys.setenv(ARROW_DEFAULT_MEMORY_POOL="system") within
Rstudio? I tried a few different ways and it always just shows me mimalloc. It
does work in a R console window (which is where I did the above test).
was (Author: klar):
Three additional questions:
* Did the memory model (i.e. keeping a copy within arrow) change after 0.15 or
was it introduced afterwards? That was the previous version I was using and I
never had these kind of memory "issues" with it (understood that issues is not
necessarily the right word). The double counting just seems very punitive.
I just tried "system" and it does free it up (as you said) but for a while R is
using about 70gb when the actual object size within R is just 30gb.
* Do you know if factors that are written from (and then show up as dictionary
in the table schema) are especially punitive compared to strings?
* Were you able to set Sys.setenv(ARROW_DEFAULT_MEMORY_POOL="system") within
Rstudio? I tried a few different ways and it always just shows me mimalloc. It
does work in a R console window (which is where I did the above test).
> [R] Memory usage in R blows up
> ------------------------------
>
> Key: ARROW-15730
> URL: https://issues.apache.org/jira/browse/ARROW-15730
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Christian
> Assignee: Will Jones
> Priority: Major
> Fix For: 6.0.1
>
> Attachments: image-2022-02-19-09-05-32-278.png
>
>
> Hi,
> I'm trying to load a ~10gb arrow file into R (under Windows)
> _(The file is generated in the 6.0.1 arrow version under Linux)._
> For whatever reason the memory usage blows up to ~110-120gb (in a fresh and
> empty R instance).
> The weird thing is that when deleting the object again and running a gc() the
> memory usage goes down to 90gb only. The delta of ~20-30gb is what I would
> have expected the dataframe to use up in memory (and that's also approx. what
> was used - in total during the load - when running the old arrow version of
> 0.15.1. And it is also what R shows me when just printing the object size.)
> The commands I'm running are simply:
> options(arrow.use_threads=FALSE);
> arrow::set_cpu_count(1); # need this - otherwise it freezes under windows
> arrow::read_arrow('file.arrow5')
> Is arrow reserving some resources in the background and not giving them up
> again? Are there some settings I need to change for this?
> Is this something that is known and fixed in a newer version?
> *Note* that this doesn't happen in Linux. There all the resources are freed
> up when calling the gc() function - not sure if it matters but there I also
> don't need to set the cpu count to 1.
> Any help would be appreciated.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)