jonded94 commented on issue #44599:
URL: https://github.com/apache/arrow/issues/44599#issuecomment-2479396735

   We have a script that launches quite a lot of Python processes that use 
something very similar to the test script that I've shown. Unfortunately, even 
with 250GiB RAM available on the specific host, we see OOM errors after a 
handful iterations, so it appears that the memory is not entirely free to use 
or at least the kernel is getting uneasy.
   
   We then replaced the very few used `pyarrow` methods with a custom in-house 
written, very shallow PyO3 wrapper around the [parquet Rust 
crate](https://docs.rs/parquet/latest/parquet/) which offers similar 
functionality. With that, we see constant memory load, regardless of long the 
script was run, and we're using only a few dozen GiBs total; that's far less 
than the `pyarrow` implementation.
   
   I know that our use case is pretty specific, but I wanted to share our 
experience regardless. If my experience is too vague to be of any actual 
debugging value to you, we can close the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to