jorgecarleitao commented on pull request #8630:
URL: https://github.com/apache/arrow/pull/8630#issuecomment-725500367
Thanks a lot, @alamb and @vertexclique . I agree with the naming issues
here, and great insight into those crates. I do not have strong feelings about
naming; I tried to be close to what I am familiar with from Rust (e.g.
`Vec::extend`).
A bit of background: @yordan-pavlov did an impressive job on #7798 to make
filtering performant; It is _really_ hard to get that performance with a
generalization. I had to re-write this code at least 3 times to get it to a
stage with comparable performance and even then, you can see that it depends on
the filter conditions. But there is a (IMO good) reason for generalizing the
code in `filter`.
Specifically, the use-case I am looking at is to remove most code from the
`take` kernel, since it is doing what this struct is doing (and what `filter`
was doing). There are implementation details that are different (`take`'s
indices can be null and `filter` does not support `ArrayStruct` atm), but they
are fundamentally doing the same thing: constructing an `ArrayData` by
memcopying chunks of another `ArrayData`.
Also, note that this goes beyond masking: it supports taking values
repeatedly. I see it as a "builder" bound to a specific `ArrayData`
(`ArrayDataBuilder` is already taken, though). It is not a builder like the
builders in `builder.rs` because those memcopy data from rust native types, not
Arrow types.
The reason this performant remains comparable(-ish) to master is that when
it binds to an `ArrayData`, it inspects its `DataType` to initialize functions
(the `type Extend`) bound to that `ArrayData` that are performant.
For example,
```rust
let values = &array.buffers()[0].data()[array.offset() * size_of::<T>()..];
Box::new(move |mutable: &mut _MutableArrayData, start: usize, len: usize| {
```
instead of
```rust
Box::new(move |mutable: &mut _MutableArrayData, array: &ArrayData, start:
usize, len: usize| {
let values = &array.buffers()[0].data()[array.offset() *
size_of::<T>()..];
```
has a meaningful performance impact because it avoids checks on both
`buffers()[0]` and `[something..]` on every call of `extend`. This check may
seem small, but it significantly impacts performance because `extend` can be as
small as: `copy 2 bits` (in the case of a boolean array). When there are many
calls of this operation (e.g. filter every other slot), these checks are as or
more expensive than the `memcopy` themselves.
There is a further boost possible that I explored but did not finish, but it
requires a bit of `unsafe` code: we know that these functions never outlive
`MutableDataArray`. Therefore, we could actually unsafely pass `&self.data` to
their initialization and avoid all costs associated with accessing
`mutable.buffers()` and the like inside `extend`. In this case, we would use
```rust
type Extend<'a> = Box<Fn(usize, usize) -> () + 'a>;
```
and bind the `_MutableArrayData` on `build_primitive_extend<'a, T>(array,
data)`. This would be the closest to the current filter implementation that
remains generic for other use-cases.
My point is that there are really non-trivial optimizations done in this
proposal that were borrowed from the excellent work from @yordan-pavlov and
that are required to keep up with very high bar set by #7798 This draft is
trying to leverage that work on two of our kernels, `take` and `filter`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]