[jira] [Comment Edited] (ARROW-15730) [R] Memory usage in R blows up

Christian (Jira) Tue, 22 Feb 2022 10:56:11 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496274#comment-17496274
 ]


Christian edited comment on ARROW-15730 at 2/22/22, 6:54 PM:
-------------------------------------------------------------

Note that this is with a cut down version. Here the size of the file is ~1gb 
and is written with the "default" compression. It takes about 3-5gb when 
reading it into R. And the space that doesn't get freed up is ~7gb.

(Deleting table and running another gc() keeps the 7gb allocated.)

 

> arrow_info()$memory
$backend_name
[1] "mimalloc"

$bytes_allocated
[1] 5379819648

$max_memory
[1] 5379819648

$available_backends
[1] "mimalloc" "system"  

> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 2625778 140.3    4439937 237.2  4439937 237.2
Vcells 7082576  54.1   12255594  93.6  9236142  70.5
> table$schema
Schema
: int32
: date32[day]
: string
: string
string
date32[day]
string
string
double
double
double
int32
int32
string
double
string
double
double
string
double
double
string
string
double
string
bool
date32[day]
string
string
string
string
string
string
int32
int32
int32
int32
int32
int32
string
int32
string
int32
string
string
string
string
string
string
string
string
string
string
string
string
string
string
string
string
string
double
double
int32
double
date32[day]
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>


was (Author: klar):
Note that this is with a cut down version. Here the size of the file is ~1gb 
and is written with the "default" compression. It takes about 3-5gb when 
reading it into R. And the space that doesn't get freed up is ~7gb.

 

> arrow_info()$memory
$backend_name
[1] "mimalloc"

$bytes_allocated
[1] 5379819648

$max_memory
[1] 5379819648

$available_backends
[1] "mimalloc" "system"  

> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 2625778 140.3    4439937 237.2  4439937 237.2
Vcells 7082576  54.1   12255594  93.6  9236142  70.5
> table$schema
Schema
: int32
: date32[day]
: string
: string
string
date32[day]
string
string
double
double
double
int32
int32
string
double
string
double
double
string
double
double
string
string
double
string
bool
date32[day]
string
string
string
string
string
string
int32
int32
int32
int32
int32
int32
string
int32
string
int32
string
string
string
string
string
string
string
string
string
string
string
string
string
string
string
string
string
double
double
int32
double
date32[day]
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>
dictionary<values=string, indices=int8>

> [R] Memory usage in R blows up
> ------------------------------
>
>                 Key: ARROW-15730
>                 URL: https://issues.apache.org/jira/browse/ARROW-15730
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Christian
>            Assignee: Will Jones
>            Priority: Major
>             Fix For: 6.0.1
>
>         Attachments: image-2022-02-19-09-05-32-278.png
>
>
> Hi,
> I'm trying to load a ~10gb arrow file into R (under Windows)
> _(The file is generated in the 6.0.1 arrow version under Linux)._
> For whatever reason the memory usage blows up to ~110-120gb (in a fresh and 
> empty R instance).
> The weird thing is that when deleting the object again and running a gc() the 
> memory usage goes down to 90gb only. The delta of ~20-30gb is what I would 
> have expected the dataframe to use up in memory (and that's also approx. what 
> was used - in total during the load - when running the old arrow version of 
> 0.15.1. And it is also what R shows me when just printing the object size.)
> The commands I'm running are simply:
> options(arrow.use_threads=FALSE);
> arrow::set_cpu_count(1); # need this - otherwise it freezes under windows
> arrow::read_arrow('file.arrow5')
> Is arrow reserving some resources in the background and not giving them up 
> again? Are there some settings I need to change for this?
> Is this something that is known and fixed in a newer version?
> *Note* that this doesn't happen in Linux. There all the resources are freed 
> up when calling the gc() function - not sure if it matters but there I also 
> don't need to set the cpu count to 1.
> Any help would be appreciated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (ARROW-15730) [R] Memory usage in R blows up

Reply via email to