jorgecarleitao commented on pull request #8303:
URL: https://github.com/apache/arrow/pull/8303#issuecomment-701328143


   > @jorgecarleitao you said:
   > 
   > > IMO we should follow up on this: for kernels we have been using a 
mutable buffer with null masks as much as possible.
   > 
   > Can you perhaps let me know what you mean by this? Perhaps there is an 
example in the code you are thinking of?
   
   I am really, sorry, @alamb , I should have offered more context in the first 
place. :/
   
   This in no way blocks this PR: IMO it is ready to merge if the relevant 
tests pass.
   
   What I meant is that this code currently:
   
   * creates `Vec<Option<T>>` through an iteration
   * copies `Vec<Option<T>>` to the two buffers (when `from_opt_vec` is called)
   
   it may be more efficient to create the buffers during the iteration, so that 
we avoid the copy (Vec -> buffers). In other words, the code in `from_opt_vec` 
could have been "injected" into the filter execution, where the 
`MuttableBuffer` and offsets and values buffer are created before the loop, and 
new elements are directly written to it. Does this any sense?
   
   (as a side note, this is why I am proposing #8211 : IMO there is some 
boiler-plate copy-pasting to
   
   1. initialize buffers
   2. iterate
   3. create `ArrayData` from buffers
   
   which will continue to grow as we add more kernels, and whose pattern seems 
to be a `FromIter of fixed size`)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to