jorgecarleitao commented on pull request #8118: URL: https://github.com/apache/arrow/pull/8118#issuecomment-695335519
@nevi-me and @andygrove , I reverted the change wrt to the builder, so that this is an additive PR. @andygrove, wrt to the dynamically building the array, note that a StructArray is almost only composed by child data: the struct itself is a null bitmap and some pointers. Therefore, the cost of building a Struct will always be driven by the allocation of those buffers. With that said, you are right that during the creation of the fields, the benchmark clones the arrays, while a builder will build them on the fly and thus reduce memory footprint. IMO that issue is separated from the creation of the struct itself (but related to the build of its childs): it is how we efficiently build non-struct arrays without first allocating vectors, that the builders aimed at solving. I am outlying some of this on #8211, which allows to build primitive Arrays from an iterator without exposing a unsafe API to users and would avoid the double allocation that you refer to. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org