tustvold commented on PR #6300:
URL: https://github.com/apache/arrow-rs/pull/6300#issuecomment-2325092356

   > Beyond the fact that it is likely much more work, it does sound like an 
antipattern though: given that Arrow arrays are immutable once created, is 
there any situation in which you would ever want all that extra capacity to 
hang around?
   
   When the buffer is merely an intermediate that won't live for very long. 
This is extremely common in query processing, and in fact one of the major 
motivations for the new binary view types is to avoid copying at the expense of 
less efficient memory usage.
   
   > Memory usage divided by 2, irrelevant of the allocator used.
   
   I believe this is only the case where the bump allocator is being used, 
which is almost always terrible from a performance standpoint. Provided 
capacity estimation is done correctly, as most of the kernels go to great pains 
to do, this shouldn't be a major issue. Perhaps there is a particular codepath 
you are running into that is not doing this?
   
   > Adding another API (if it doesn't already exist) to "shrink_to_fit" for 
Arrays in general
   
   I think this would be my preference, if only because it is much easier to 
understand the ramifications of such a change. Such a method should be 
relatively easy to add to the relevant ArrayBuilders, and therefore not hugely 
disruptive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to