[
https://issues.apache.org/jira/browse/ARROW-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513573#comment-17513573
]
Ziheng Wang commented on ARROW-16037:
-------------------------------------
Does not help.
mem usage 179580928 80000000
mem usage 179580928 80000000
mem usage 263417856 81280000
mem usage 330989568 81280000
mem usage 414253056 81280000
mem usage 476971008 81280000
mem usage 553205760 81280000
mem usage 599236608 81280000
mem usage 674279424 81280000
mem usage 709419008 81280000
mem usage 764780544 81280000
mem usage 795869184 81280000
mem usage 755544064 81280000
mem usage 817991680 81280000
mem usage 803844096 81280000
mem usage 751759360 81280000
mem usage 833671168 81280000
mem usage 780136448 81280000
mem usage 780677120 81280000
mem usage 812576768 81280000
mem usage 814198784 81280000
mem usage 794263552 81280000
mem usage 796155904 81280000
mem usage 797507584 81280000
mem usage 798318592 81280000
mem usage 799399936 81280000
mem usage 800481280 81280000
mem usage 801832960 81280000
mem usage 801832960 81280000
mem usage 814268416 81280000
mem usage 737091584 81280000
mem usage 781156352 81280000
mem usage 854958080 81280000
mem usage 825163776 81280000
mem usage 853274624 81280000
mem usage 737460224 81280000
mem usage 801800192 81280000
mem usage 810450944 81280000
mem usage 558161920 81280000
mem usage 620609536 81280000
mem usage 670892032 81280000
mem usage 733609984 81280000
mem usage 733609984 81280000
mem usage 734961664 81280000
mem usage 735502336 81280000
mem usage 623452160 81280000
mem usage 609259520 81280000
mem usage 671707136 81280000
mem usage 707751936 81280000
mem usage 707751936 81280000
mem usage 743706624 81280000
mem usage 743706624 81280000
mem usage 562147328 81280000
mem usage 624594944 81280000
mem usage 637956096 81280000
mem usage 700403712 81280000
mem usage 680206336 81280000
mem usage 725622784 81280000
mem usage 706813952 81280000
mem usage 708165632 81280000
mem usage 728440832 81280000
mem usage 789536768 81280000
mem usage 541188096 81280000
mem usage 602284032 81280000
mem usage 639320064 81280000
mem usage 670679040 81280000
mem usage 746643456 81280000
mem usage 719314944 81280000
mem usage 495579136 81280000
mem usage 567218176 81280000
mem usage 612093952 81280000
mem usage 679677952 81280000
mem usage 661270528 81280000
mem usage 712622080 81280000
mem usage 714514432 81280000
mem usage 716136448 81280000
mem usage 717217792 81280000
mem usage 771825664 81280000
mem usage 784801792 81280000
mem usage 822665216 81280000
mem usage 823205888 81280000
mem usage 823205888 81280000
mem usage 823476224 81280000
mem usage 829153280 81280000
mem usage 836722688 81280000
mem usage 471212032 81280000
mem usage 552583168 81280000
mem usage 622600192 81280000
mem usage 659906560 81280000
mem usage 730193920 81280000
mem usage 730193920 81280000
mem usage 753713152 81280000
mem usage 753983488 81280000
mem usage 727797760 81280000
mem usage 727797760 81280000
mem usage 729419776 81280000
mem usage 731041792 81280000
mem usage 732663808 81280000
mem usage 733745152 81280000
mem usage 733745152 81280000
mem usage 735367168 81280000
> Possible memory leak in compute.take
> ------------------------------------
>
> Key: ARROW-16037
> URL: https://issues.apache.org/jira/browse/ARROW-16037
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 6.0.1
> Environment: Ubuntu
> Reporter: Ziheng Wang
> Priority: Blocker
>
> If you run the following code, the memory usage of the process goes up to 1GB
> even though the pyarrow allocated bytes is always at ~80MB. The process
> memory comes down after a while to 800 MB, but is still way more than what is
> necessary.
> '''
> import pyarrow as pa
> import numpy as np
> import pandas as pd
> import os, psutil
> import pyarrow.compute as compute
> import gc
> my_table =
> pa.Table.from_pandas(pd.DataFrame(np.random.normal(size=(10000,1000))))
> process = psutil.Process(os.getpid())
> print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())
> for i in range(100):
> print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())
> temp = compute.sort_indices(my_table['0'], sort_keys =
> [('0','ascending')])
> my_table = my_table.take(temp)
> gc.collect()
> '''
--
This message was sent by Atlassian Jira
(v8.20.1#820001)