jorgecarleitao edited a comment on pull request #8630:
URL: https://github.com/apache/arrow/pull/8630#issuecomment-725500367


   Thanks a lot, @alamb and @vertexclique . I agree with the naming issues 
here, and great insight into those crates. I do not have strong feelings about 
naming; I tried to be close to what I am familiar with from Rust (e.g. 
`Vec::extend`).
   
   A bit of background: @yordan-pavlov did an impressive job on #7798 to make 
filtering performant; It is _really_ hard to get that performance with a 
generalization. I had to re-write this code at least 3 times to get it to a 
stage with comparable performance and even then, you can see that it depends on 
the filter conditions. But there is a (IMO good) reason for generalizing the 
code in `filter`.
   
   Specifically, the use-case I am looking at is to remove most code from the 
`take` kernel, since it is doing what this struct is doing (and what `filter` 
was doing). There are implementation details that are different (`take`'s 
indices can be null and `filter` does not support `ArrayStruct` atm), but they 
are fundamentally doing the same thing: constructing an `ArrayData` by 
memcopying chunks of another `ArrayData`.
   
   Also, note that this goes beyond masking: it supports taking values 
repeatedly. I see it as a "builder" bound to a specific `ArrayData` 
(`ArrayDataBuilder` is already taken, though). It is not a builder like the 
builders in `builder.rs` because those memcopy data from rust native types, not 
Arrow types.
   
   The reason this performance remains comparable(-ish) to master is that when 
it binds to an `ArrayData`, it inspects its `DataType` to initialize functions 
(the `type Extend`) bound to that `ArrayData` that are performant.
   
   For example,
   
   ```rust
   let values = &array.buffers()[0].data()[array.offset() * size_of::<T>()..];
   Box::new(move |mutable: &mut _MutableArrayData, start: usize, len: usize| {
   ```
   
   instead of
   
   ```rust
   Box::new(move |mutable: &mut _MutableArrayData, array: &ArrayData, start: 
usize, len: usize| {
       let values = &array.buffers()[0].data()[array.offset() * 
size_of::<T>()..];
   ```
   
   has a meaningful performance impact because it avoids checks on both 
`buffers()[0]` and `[something..]` on every call of `extend`. This check may 
seem small, but it significantly impacts performance because `extend` can be as 
small as: `copy 2 bits` (in the case of a boolean array). When there are many 
calls of this operation (e.g. filter every other slot), these checks are as or 
more expensive than the `memcopy` themselves.
   
   There is a further boost possible that I explored but did not finish, but it 
requires a bit of `unsafe` code: we know that these functions never outlive 
`MutableDataArray`. Therefore, we could actually unsafely pass `&self.data` to 
their initialization and avoid all costs associated with accessing 
`mutable.buffers()` and the like inside `extend`. In this case, we would use
   
   ```rust
   type Extend<'a> = Box<Fn(usize, usize) -> () + 'a>;
   ```
   
   and bind the `_MutableArrayData` on `build_primitive_extend<'a, T>(array, 
data)`. This would be the closest to the current filter implementation that 
remains generic for other use-cases.
   
   My point is that there are really non-trivial optimizations done in this 
proposal that were borrowed from the excellent work from @yordan-pavlov and 
that are required to keep up with very high bar set by #7798  This draft is 
trying to leverage that work on two of our kernels, `take` and `filter`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to