efredine commented on issue #11281: URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2212470949
Yes - the primary (initial) problem is that the collection needs to be built so that it owns the references to the items but we want to do that without creating any intermediate values. I would also have naively expected that `from_iter` should perform comparably. I did notice that there are two implementations for GenericByteArray and I'm not clear which one would be chosen here: https://github.com/apache/arrow-rs/blob/master/arrow-array/src/array/byte_array.rs#L534-L552 The choice depends on lifetimes, so perhaps the other one is being invoked and it's not pre-allocating capacity in as efficient a way? In general, I think you're right that we should be able to eliminate all intermediate vectors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
