tustvold commented on issue #4724: URL: https://github.com/apache/arrow-rs/issues/4724#issuecomment-1688710277
We generally try very hard to avoid copying data, as it is a major bottleneck and typically not desirable. The downside as you have discovered is potentially higher memory usage. Another area this turns up in is array slicing, which is zero-copy in a similar manner. I'm not averse to adding a kernel or function on Array to "compact" the umderlying buffers of an array, but I wonder if you've experimented with writing the data in smaller batches? Ultimately if the size of an encoded RecordBatch is large enough to cause concern, you're going to struggle to process it without blowing your memory budget regardless... A smaller batch size might let you get the best of both worlds, zero-copy without blowing your memory budget. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
