pablodcar commented on issue #37801:
URL: https://github.com/apache/arrow/issues/37801#issuecomment-1744394914

   Thanks for the response.
   
   > In a process where you are continuously appending data, the better 
approach would be a two-step logic: first gather the batches of data in a 
separate Table, and only when this reaches a certain size threshold (eg 65k), 
actually combine the chunks of this table (involving a copy) into a single 
chunk, and then append this combined chunk to the overall `rates_table`.
   > 
   > It would be interesting to see when using such an approach if you still 
notice memory issues.
   > 
   > > and if I use `combine_chunks` from time to time, things get worse.
   > 
   > Given my explanation above, I have to admit that this is a bit strange ..
   
   Calling `combine_chunks` every 65k items - instead of a single call at the 
end - helps in terms of CPU and memory. In fact, the memory remains almost 
constant, i.e. if I run the cycle twice, the memory is not doubled, only it's 
increased a little.
   
   Will test with a long-running process, thanks.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to