Re: [I] Avoid concatenating record batches in joins to alleviate memory pressure [datafusion]

via GitHub Fri, 19 Jun 2026 08:12:48 -0700


maxburke commented on issue #23031:
URL: https://github.com/apache/datafusion/issues/23031#issuecomment-4752737768


   Whether or not the core of `concat_batches` is efficient, it will always 
double the memory consumed as long as it takes batches by reference.
   
   I understand the desire to not complicate the joins, but also realize the 
implication on performance of duplicating a large amount of input data.
   
   Making `concat_batches` more efficient also does not solve the problem of 
overflowing offsets when operating on large numbers of input records; in my 
case the input has 110m rows and it's erroring out when trying to create a 
single StringArray.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Avoid concatenating record batches in joins to alleviate memory pressure [datafusion]

Reply via email to