Hi Experts,

I have come across the memory pool configurations using an environment
variable *ARROW_DEFAULT_MEMORY_POOL* and I tried to make use of them and
test it.

I could observe improvements on macOS with the *system* memory pool but no
change on linux os. I have captured more details on GH issue
https://github.com/apache/arrow/issues/36100... If any one can highlight or
suggest a way to overcome this problem will be helpful. Appreciate your
help.!

Regards,
Alex

On Wed, Jun 14, 2023 at 9:35 PM Jerald Alex <vminf...@gmail.com> wrote:

> Hi Experts,
>
> Pyarrow *Table.from_pylist* does not release memory until the program
> terminates. I created a sample script to highlight the issue. I have also
> tried setting up `pa.jemalloc_set_decay_ms(0)` but it didn't help much.
> Could you please check this and let me know if there are potential issues /
> any workaround to resolve this?
>
> >>> pyarrow.__version__
> '12.0.0'
>
> OS Details:
> OS: macOS 13.4 (22F66)
> Kernel Version: Darwin 22.5.0
>
>
>
> Sample code to reproduce. (it needs memory_profiler)
>
> #file_name: test_exec.py
> import pyarrow as pa
> import time
> import random
> import string
>
> from memory_profiler import profile
>
> def get_sample_data():
>     record1 = {}
>     for col_id in range(15):
>         record1[f"column_{col_id}"] = string.ascii_letters[10 :
> random.randint(17, 49)]
>
>     return [record1]
>
> def construct_data(data):
>     count = 1
>     while count < 10:
>         pa.Table.from_pylist(data * 100000)
>         count += 1
>     return True
>
> @profile
> def main():
>     data = get_sample_data()
>     construct_data(data)
>     print("construct data completed!")
>
> if __name__ == "__main__":
>     main()
>     time.sleep(600)
>
>
> memory_profiler output:
>
> Filename: test_exec.py
>
> Line #    Mem usage    Increment  Occurrences   Line Contents
> =============================================================
>     41     65.6 MiB     65.6 MiB           1   @profile
>     42                                         def main():
>     43     65.6 MiB      0.0 MiB           1       data = get_sample_data()
>     44    203.8 MiB    138.2 MiB           1       construct_data(data)
>     45    203.8 MiB      0.0 MiB           1       print("construct data
> completed!")
>
> Regards,
> Alex
>

Reply via email to