kevinpemonon opened a new issue, #36161:
URL: https://github.com/apache/arrow/issues/36161
### Describe the bug, including details regarding any error messages,
version, and platform.
Hello,
I'm using R versions 4.1.3 and 4.2.1 on Windows and I'm having a problem
with memory usage.
Currently, I need to use the arrow and dplyr libraries in a program and when
I compare the memory used between the windows task manager and the
memory.size(max=F) function, the one given by the windows task manager is much
larger, 244.1 MB, than the one given by the memory.size(max=F) function, 76.58
MB.
However, I delete objects created with rm() and then use the gc() function
to recover the memory used by the object.
Below, with and without ouput, the R code I used to present my problem :
- With output :
`
> gc(verbose = TRUE)
Garbage collection 2 = 0+0+2 (level 2) ...
14.2 Mbytes of cons cells used (41%)
3.9 Mbytes of vectors used (6%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 264908 14.2 648748 34.7 401965 21.5
Vcells 500529 3.9 8388608 64.0 1671274 12.8
>
> # Mémoire de base
> memory.size(max=F)
[1] 28.78
>
> library(arrow)
Attachement du package : ‘arrow’
L'objet suivant est masqué depuis ‘package:utils’:
timestamp
> # Mémoire aprés chargement de la librairie arrow
> memory.size(max=F)
[1] 51.32
>
> library(dplyr)
Attachement du package : ‘dplyr’
Les objets suivants sont masqués depuis ‘package:stats’:
filter, lag
Les objets suivants sont masqués depuis ‘package:base’:
intersect, setdiff, setequal, union
> # Mémoire aprés chargement de la librairie dplyr
> memory.size(max=F)
[1] 90.2
>
> gc(verbose = TRUE)
Garbage collection 13 = 8+1+4 (level 2) ...
36.0 Mbytes of cons cells used (49%)
9.0 Mbytes of vectors used (14%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 673549 36 1380031 73.8 1380031 73.8
Vcells 1174575 9 8388608 64.0 1853297 14.2
>
> df <- data.frame(
+ col1 = rnorm(1000000),
+ col2 = rnorm(1000000),
+ col3 = runif(1000000),
+ col4 = sample(1:999, size = 1000000, replace = T),
+ col5 = sample(c("GroupA", "GroupB"), size = 1000000, replace = T),
+ col6 = sample(c("TypeA", "TypeB"), size = 1000000, replace = T)
+ )
> # Mémoire aprés création de l'objet df
> memory.size(max=F)
[1] 117.97
>
> arrow::write_dataset(
+ df,
+ paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"),
+ format = "parquet"
+ )
> # Mémoire aprés écriture sur le disque
> memory.size(max=F)
[1] 117.53
>
> rm(df)
> # Mémoire aprés suppression df
> memory.size(max=F)
[1] 117.56
>
> gc(verbose = TRUE)
Garbage collection 15 = 9+1+5 (level 2) ...
45.0 Mbytes of cons cells used (61%)
38.0 Mbytes of vectors used (49%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 842172 45 1380031 73.8 1380031 73.8
Vcells 4976076 38 10146329 77.5 8362972 63.9
> # Mémoire aprés gc(verbose = TRUE)
> memory.size(max=F)
[1] 93.57
>
> gc(verbose = TRUE)
Garbage collection 16 = 9+1+6 (level 2) ...
45.0 Mbytes of cons cells used (61%)
11.3 Mbytes of vectors used (15%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 842053 45.0 1380031 73.8 1380031 73.8
Vcells 1475891 11.3 10146329 77.5 8362972 63.9
> # Mémoire aprés gc(verbose = TRUE)
> memory.size(max=F)
[1] 66.63
>
> ds <-
arrow::open_dataset(paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"))
> # Mémoire aprés création de ds
> memory.size(max=F)
[1] 71.11
>
> req <-
+ ds %>%
+ collect()
> # Mémoire aprés création de req
> memory.size(max=F)
[1] 77.09
>
> rm(req)
> # Mémoire aprés suppression df
> memory.size(max=F)
[1] 77.15
>
> gc(verbose = TRUE)
Garbage collection 17 = 9+1+7 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927293 49.6 1797205 96.0 1380031 73.8
Vcells 1627658 12.5 10146329 77.5 8362972 63.9
> # Mémoire aprés gc(verbose = TRUE)
> memory.size(max=F)
[1] 68.49
>
> gc(verbose = TRUE)
Garbage collection 18 = 9+1+8 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927239 49.6 1797205 96.0 1380031 73.8
Vcells 1627568 12.5 10146329 77.5 8362972 63.9
> # Mémoire aprés gc(verbose = TRUE)
> memory.size(max=F)
[1] 68.49
>
> rm(ds)
> # Mémoire aprés suppression df
> memory.size(max=F)
[1] 68.49
>
> gc(verbose = TRUE)
Garbage collection 19 = 9+1+9 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927149 49.6 1797205 96.0 1380031 73.8
Vcells 1627532 12.5 10146329 77.5 8362972 63.9
> # Mémoire aprés gc(verbose = TRUE)
> memory.size(max=F)
[1] 68.49
>
> gc(verbose = TRUE)
Garbage collection 20 = 9+1+10 (level 2) ...
49.6 Mbytes of cons cells used (52%)
12.5 Mbytes of vectors used (16%)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 927146 49.6 1797205 96.0 1380031 73.8
Vcells 1627527 12.5 10146329 77.5 8362972 63.9
> # Mémoire aprés gc(verbose = TRUE)
> memory.size(max=F)
[1] 68.49
`
- Without output :
`
gc(verbose = TRUE)
# Mémoire de base
memory.size(max=F)
library(pryr)
# Mémoire après chargement de la librairie pryr avec memory.size
memory.size(max=F)
library(arrow)
# Mémoire après chargement de la librairie arrow avec memory.size
memory.size(max=F)
library(dplyr)
# Mémoire après chargement de la librairie dplyr avec memory.size
memory.size(max=F)
gc(verbose = TRUE)
memory.size(max=F)
df <- data.frame(
col1 = rnorm(1000000),
col2 = rnorm(1000000),
col3 = runif(1000000),
col4 = sample(1:999, size = 1000000, replace = T),
col5 = sample(c("GroupA", "GroupB"), size = 1000000, replace = T),
col6 = sample(c("TypeA", "TypeB"), size = 1000000, replace = T)
)
# Mémoire aprés création de l'objet df
memory.size(max=F)
arrow::write_dataset(
df,
paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"),
format = "parquet"
)
# Mémoire aprés écriture sur le disque
memory.size(max=F)
rm(df)
# Mémoire aprés suppression df
memory.size(max=F)
gc(verbose = TRUE)
# Mémoire aprés gc(verbose = TRUE)
memory.size(max=F)
gc(verbose = TRUE)
# Mémoire aprés gc(verbose = TRUE)
memory.size(max=F)
ds <- arrow::open_dataset(paste0(Sys.getenv("USERPROFILE"),"/ExProblemeGc"))
# Mémoire aprés création de ds
memory.size(max=F)
req <-
ds %>%
collect()
# Mémoire aprés création de req
memory.size(max=F)
rm(req)
# Mémoire aprés suppression df
memory.size(max=F)
gc(verbose = TRUE)
# Mémoire aprés gc(verbose = TRUE)
memory.size(max=F)
gc(verbose = TRUE)
# Mémoire aprés gc(verbose = TRUE)
memory.size(max=F)
rm(ds)
# Mémoire aprés suppression df
memory.size(max=F)
gc(verbose = TRUE)
# Mémoire aprés gc(verbose = TRUE)
memory.size(max=F)
gc(verbose = TRUE)
# Mémoire aprés gc(verbose = TRUE)
memory.size(max=F)
`
Do you think this memory difference is normal?
Could it be caused by the libraries used and/or by bad practices in using
the R language?
Thank you for your help, and I remain at your disposal for any further
information you may require.
Best regards,
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]