tustvold commented on issue #6692:
URL: https://github.com/apache/arrow-rs/issues/6692#issuecomment-2786116734

   Perhaps it might be worth thinking about what use-cases we're trying to 
improve the performance of with this effort, this will ensure we design 
something that adequately addresses that use-case?
   
   If we're just talking about PrimitiveArray and StringViewArray types, then I 
suspect any performance delta is likely to be relatively minor as concatenating 
such arrays is already extremely cheap.
   
   If, however, we're looking to improve the performance of DictionaryArray, 
this becomes a whole different can of worms as any append-based interface is 
likely to struggle to efficiently handle arrays with heterogeneous dictionary 
values. I'm not sure if there is a good solution here tbh.
   
   The only array types where I could see such an append interface potentially 
having compelling performance benefits are (Large)StringArray, as it would 
allow eliding potentially large string copies. That being said this would be 
reliant on knowing the expected amount of string data up-front, which an append 
interface won't necessarily know, and use-cases should probably just use 
StringViewArray...
   
   The initial issue also stated
   
   > Memory Overhead / Performance Overhead for GarbageCollecting StringView: 
Buffering up several RecordBatches with StringView may consume significant 
amounts of memory for mostly filtered rows, which requires us to run gc 
periodically which actually slows some things down (see 
https://github.com/apache/datafusion/issues/11628)
   
   But I am honestly not entirely sure how an append interface really changes 
this, you need to perform some sort of GC at some point, it is unclear to me 
why doing it as part of a Coalesce operation or as part of the filter itself 
would behave materially differently...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to