efredine commented on issue #11281:
URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2212470949

   Yes - the primary (initial) problem is that the collection needs to be built 
so that it owns the references to the items but we want to do that without 
creating any intermediate values.
   
   I would also have naively expected that `from_iter` should perform 
comparably. I did notice that there are two implementations for 
GenericByteArray and I'm not clear which one would be chosen here:
   
https://github.com/apache/arrow-rs/blob/master/arrow-array/src/array/byte_array.rs#L534-L552
   
   The choice depends on lifetimes, so perhaps the other one is being invoked 
and it's not pre-allocating capacity in as efficient a way?
   
   In general, I think you're right that we should be able to eliminate all 
intermediate vectors. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to