westonpace commented on issue #14606:
URL: https://github.com/apache/arrow/issues/14606#issuecomment-1309172640

   Memory problems are very difficult to debug and answer.  There are a lot of 
factors involved.  Can you perhaps create some kind of reproducible example?
   
   > Do we need to manually release the unused memory pool in 
default_shared_pool to keep memory efficient?
   
   No, you should not have to do this.  Those capabilities are more for 
debugging and very corner case scenarios.  They should not be needed for 
regular use.
   
   > However, we observe there are some native memory consumption even if there 
are no incoming data and the residual memory in native heap continue to grow 
(not seems to be leak) and sometimes can bring OOM issues when we are 
processing very large data
   
   What sorts of API calls are you making to do this conversion? Is this only 
repeated calls to `pyarrow.parquet.write_table`?  How are you creating your 
tables / arrays?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to