[ 
https://issues.apache.org/jira/browse/ARROW-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496382#comment-17496382
 ] 

Christian edited comment on ARROW-15730 at 2/22/22, 11:50 PM:
--------------------------------------------------------------

I have a conceptual question though: Does the current setup create a "copy" of 
the file within the arrow memory (meaning does it read the entire file into 
arrow, and then loads it into R)? Because for large data.frames the double 
counting would be an issue?

 

And additionally even if isn't fully a memory leak it seems that once I delete 
the object, and then load another one, the space isn't used at all - Arrow is 
reserving more/incremental memory. So the system just starts running out of 
space.


was (Author: klar):
I have a conceptual question though: Does the current setup create a "copy" of 
the file within the arrow memory (meaning does it read the entire file into 
arrow, and then loads it into R)? Because for large data.frames the double 
counting would be an issue?

 

And additionally even if isn't fully a memory leak it seems that once I delete 
the object, and then load another one, the space isn't used up - Arrow is 
reserving more/incremental memory. So the system just starts running out of 
space.

> [R] Memory usage in R blows up
> ------------------------------
>
>                 Key: ARROW-15730
>                 URL: https://issues.apache.org/jira/browse/ARROW-15730
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Christian
>            Assignee: Will Jones
>            Priority: Major
>             Fix For: 6.0.1
>
>         Attachments: image-2022-02-19-09-05-32-278.png
>
>
> Hi,
> I'm trying to load a ~10gb arrow file into R (under Windows)
> _(The file is generated in the 6.0.1 arrow version under Linux)._
> For whatever reason the memory usage blows up to ~110-120gb (in a fresh and 
> empty R instance).
> The weird thing is that when deleting the object again and running a gc() the 
> memory usage goes down to 90gb only. The delta of ~20-30gb is what I would 
> have expected the dataframe to use up in memory (and that's also approx. what 
> was used - in total during the load - when running the old arrow version of 
> 0.15.1. And it is also what R shows me when just printing the object size.)
> The commands I'm running are simply:
> options(arrow.use_threads=FALSE);
> arrow::set_cpu_count(1); # need this - otherwise it freezes under windows
> arrow::read_arrow('file.arrow5')
> Is arrow reserving some resources in the background and not giving them up 
> again? Are there some settings I need to change for this?
> Is this something that is known and fixed in a newer version?
> *Note* that this doesn't happen in Linux. There all the resources are freed 
> up when calling the gc() function - not sure if it matters but there I also 
> don't need to set the cpu count to 1.
> Any help would be appreciated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to