jorgecarleitao commented on pull request #8118:
URL: https://github.com/apache/arrow/pull/8118#issuecomment-695335519


   @nevi-me and @andygrove , I reverted the change wrt to the builder, so that 
this is an additive PR.
   
   @andygrove, wrt to the dynamically building the array, note that a 
StructArray is almost only composed by child data: the struct itself is a null 
bitmap and some pointers. Therefore, the cost of building a Struct will always 
be driven by the allocation of those buffers.
   
   With that said, you are right that during the creation of the fields, the 
benchmark clones the arrays, while a builder will build them on the fly and 
thus reduce memory footprint.
   
   IMO that issue is separated from the creation of the struct itself (but 
related to the build of its childs): it is how we efficiently build non-struct 
arrays without first allocating vectors, that the builders aimed at solving. I 
am outlying some of this on #8211, which allows to build primitive Arrays from 
an iterator without exposing a unsafe API to users and would avoid the double 
allocation that you refer to.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to