tustvold commented on issue #4724:
URL: https://github.com/apache/arrow-rs/issues/4724#issuecomment-1688710277

   We generally try very hard to avoid copying data, as it is a major 
bottleneck and typically not desirable. The downside as you have discovered is 
potentially higher memory usage. Another area this turns up in is array 
slicing, which is zero-copy in a similar manner.
   
   I'm not averse to adding a kernel or function on Array to "compact" the 
umderlying buffers of an array, but I wonder if you've experimented with 
writing the data in smaller batches? Ultimately if the size of an encoded 
RecordBatch is large enough to cause concern, you're going to struggle to 
process it without blowing your memory budget regardless... A smaller batch 
size might let you get the best of both worlds, zero-copy without blowing your 
memory budget.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to