jorgecarleitao commented on pull request #8303: URL: https://github.com/apache/arrow/pull/8303#issuecomment-701328143
> @jorgecarleitao you said: > > > IMO we should follow up on this: for kernels we have been using a mutable buffer with null masks as much as possible. > > Can you perhaps let me know what you mean by this? Perhaps there is an example in the code you are thinking of? I am really, sorry, @alamb , I should have offered more context in the first place. :/ This in no way blocks this PR: IMO it is ready to merge if the relevant tests pass. What I meant is that this code currently: * creates `Vec<Option<T>>` through an iteration * copies `Vec<Option<T>>` to the two buffers (when `from_opt_vec` is called) it may be more efficient to create the buffers during the iteration, so that we avoid the copy (Vec -> buffers). In other words, the code in `from_opt_vec` could have been "injected" into the filter execution, where the `MuttableBuffer` and offsets and values buffer are created before the loop, and new elements are directly written to it. Does this any sense? (as a side note, this is why I am proposing #8211 : IMO there is some boiler-plate copy-pasting to 1. initialize buffers 2. iterate 3. create `ArrayData` from buffers which will continue to grow as we add more kernels, and whose pattern seems to be a `FromIter of fixed size`) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
