paleolimbot commented on issue #36161: URL: https://github.com/apache/arrow/issues/36161#issuecomment-1597661645
The Windows Task Manager and `memory.size()` and `gc()` all get their numbers from different places, so I'm not surprised that there are differences (although I'm not familiar with the details on Windows). I do know that any allocations made by Arrow C++ won't show up in `gc()`; however you can track these allocations using `default_memory_pool()$bytes_allocated`. Note that there are some hidden references to objects that are not always apparent (for example, when converting a Table to a data.frame, some columns may be zero-copy shells around Arrow arrays). ``` r library(arrow, warn.conflicts = FALSE) default_memory_pool()$bytes_allocated #> [1] 0 default_memory_pool()$max_memory #> [1] 0 # no bytes allocated because it has re-used R's memory array <- as_arrow_array(1:10) default_memory_pool()$bytes_allocated #> [1] 0 default_memory_pool()$max_memory #> [1] 0 # Can't re-use R memory for decimal type, so this will trigger an Arrow allocation array <- as_arrow_array(1:10, type = decimal(10, 3)) default_memory_pool()$bytes_allocated #> [1] 192 default_memory_pool()$max_memory #> [1] 256 rm(array) gc() #> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb) #> Ncells 803037 42.9 1418702 75.8 NA 1418702 75.8 #> Vcells 1370077 10.5 8388608 64.0 16384 2707166 20.7 default_memory_pool()$bytes_allocated #> [1] 0 default_memory_pool()$max_memory #> [1] 256 ``` <sup>Created on 2023-06-19 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
