maxburke commented on issue #23031: URL: https://github.com/apache/datafusion/issues/23031#issuecomment-4752737768
Whether or not the core of `concat_batches` is efficient, it will always double the memory consumed as long as it takes batches by reference. I understand the desire to not complicate the joins, but also realize the implication on performance of duplicating a large amount of input data. Making `concat_batches` more efficient also does not solve the problem of overflowing offsets when operating on large numbers of input records; in my case the input has 110m rows and it's erroring out when trying to create a single StringArray. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
