tustvold commented on PR #6300: URL: https://github.com/apache/arrow-rs/pull/6300#issuecomment-2325092356
> Beyond the fact that it is likely much more work, it does sound like an antipattern though: given that Arrow arrays are immutable once created, is there any situation in which you would ever want all that extra capacity to hang around? When the buffer is merely an intermediate that won't live for very long. This is extremely common in query processing, and in fact one of the major motivations for the new binary view types is to avoid copying at the expense of less efficient memory usage. > Memory usage divided by 2, irrelevant of the allocator used. I believe this is only the case where the bump allocator is being used, which is almost always terrible from a performance standpoint. Provided capacity estimation is done correctly, as most of the kernels go to great pains to do, this shouldn't be a major issue. Perhaps there is a particular codepath you are running into that is not doing this? > Adding another API (if it doesn't already exist) to "shrink_to_fit" for Arrays in general I think this would be my preference, if only because it is much easier to understand the ramifications of such a change. Such a method should be relatively easy to add to the relevant ArrayBuilders, and therefore not hugely disruptive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
