[
https://issues.apache.org/jira/browse/ARROW-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447620#comment-17447620
]
Sten Larsson commented on ARROW-14790:
--------------------------------------
> We don't want to export memory pool related API into Ruby yet.
Could you use the {{ARROW_DEFAULT_MEMORY_POOL}} environment variable such as
{{{}ARROW_DEFAULT_MEMORY_POOL=system{}}}?
Thanks, this seems to work on my computer! I will try it in our staging
environment.
> BTW, does {{pa.default_memory_pool().release_unused()}} decrease memory usage
> on your environment? It doesn't change anything on my environment.
No it does not seem to have any effect.
> Memory leak when reading CSV files
> ----------------------------------
>
> Key: ARROW-14790
> URL: https://issues.apache.org/jira/browse/ARROW-14790
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python, Ruby
> Reporter: Sten Larsson
> Priority: Major
>
> We're having problem with a memory leak in a Ruby script that processes many
> CSV files. I have written some short scripts do demonstrate the problem:
> [https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214]
> The first script,
> [arrow_test_csv.rb|https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_test_csv-rb],
> creates a 184 MB CSV file for testing.
> The second script,
> [arrow_memory_leak.rb|https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_memory_leak-rb],
> then loads the CSV file 10 times using Arrow. It uses the
> [get_process_mem|https://rubygems.org/gems/get_process_mem] gem to print the
> memory usage both before and after each iteration. It also invokes the
> garbage collector on each iteration to ensure the problem is not that Ruby
> holds on to any objects. This is what it prints on my MacBook Pro using Arrow
> 6.0.0:
> {noformat}
> 127577 objects, 34.234375 MB
> 127577 objects, 347.625 MB
> 127577 objects, 438.7890625 MB
> 127577 objects, 457.6953125 MB
> 127577 objects, 469.8046875 MB
> 127577 objects, 480.88671875 MB
> 127577 objects, 487.96484375 MB
> 127577 objects, 493.8359375 MB
> 127577 objects, 497.671875 MB
> 127577 objects, 498.55859375 MB
> 127577 objects, 501.42578125 MB
> {noformat}
> The third script, [arrow_memory_leak.py
> |https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_memory_leak-py]
> is a Python implementation of the same script. This shows that the problem
> is not in the Ruby bindings:
> {noformat}
> 2106 objects, 31.75390625 MB
> 2106 objects, 382.28515625 MB
> 2106 objects, 549.41796875 MB
> 2106 objects, 656.78125 MB
> 2106 objects, 679.6875 MB
> 2106 objects, 691.9921875 MB
> 2106 objects, 708.73828125 MB
> 2106 objects, 717.296875 MB
> 2106 objects, 724.390625 MB
> 2106 objects, 729.19921875 MB
> 2106 objects, 734.47265625 MB
> {noformat}
> I have also tested Arrow 5.0.0 and it has the same problem.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)