XiangpengHao opened a new pull request, #6031:
URL: https://github.com/apache/arrow-rs/pull/6031
# Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases. You can
link an issue to this PR using the GitHub syntax. For example `Closes #123`
indicates that this PR will close issue #123.
-->
Closes #.
# Rationale for this change
I made a mistake in a previous pr which writes:
```rust
let block_id = output.append_block(self.buf.clone().into());
```
I thought the `into()` will convert the `Bytes` into
`array_buffer::Buffer()` without copying the data.
But `self.buf` is `bytes::Bytes`, not `arrow_buffer::Bytes` (they
confusingly having the same name). The consequence is that the code above will
run into this procedure:
https://github.com/XiangpengHao/arrow-rs/blob/view-buffer/arrow-buffer/src/buffer/immutable.rs#L361-L370,
which implicitly copies data.
(I think we should do something to prevent future similar mistakes, but I
don't know how).
<!--
Why are you proposing this change? If this is already explained clearly in
the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your
changes and offer better suggestions for fixes.
-->
# What changes are included in this PR?
This PR explicitly converts `bytes::Bytes` into `arrow_buffer::Bytes` and
then converts to `arrow_buffer::Buffer`; I have manually verified that each of
the conversions requires no copy.
This further improves the benchmark performance:
```bash
cargo bench --bench arrow_reader --features="arrow test_common experimental"
"arrow_array_reader/BinaryViewArray/"
```
```
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
time: [129.51 µs 129.57 µs 129.63 µs]
change: [-35.172% -35.134% -35.094%] (p = 0.00 <
0.05)
Performance has improved.
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs
time: [131.07 µs 131.11 µs 131.15 µs]
change: [-59.798% -59.758% -59.723%] (p = 0.00 <
0.05)
Performance has improved.
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs
time: [125.08 µs 125.11 µs 125.15 µs]
change: [-32.699% -32.660% -32.621%] (p = 0.00 <
0.05)
Performance has improved.
```
### Bonus (not related to this PR)
You may wonder how it is possible to be [faster than
StringArray](https://github.com/apache/arrow-rs/pull/5972) if the current
implementation still copies data. TLDR: bc `memcpy` inlining.
In our benchmark, [every string is larger than 12
bytes](https://github.com/XiangpengHao/arrow-rs/blob/view-buffer/parquet/benches/arrow_reader.rs#L288),
this means that when making the views, we always fall into [this
branch](https://github.com/XiangpengHao/arrow-rs/blob/view-buffer/arrow-array/src/builder/generic_bytes_view_builder.rs#L419-L424).
The specialty about this branch is that it only reads the first 4 bytes of the
string, and LLVM is smart enough to inline loading the values (i.e., prevent
calling into `copy_non_overlapping`).
#### Is that a big deal?
Unfortunately, yes. If you change the benchmark string to very small, e.g.,
`"test"`, you should be able to see (using the same benchmark script above) the
performance of loading `ViewArray` drops significantly, and it doesn't make
sense -- how could build shorter string much slower than building longer string?
#### How to fix the short string regression?
The root cause is that when string is short, we need to `memcpy` the bytes
to the `view`, as the compiler has no idea how long the string is, it can not
inline the bytes and has to call to `memcpy`, which is slow.
To convince you more, here's a special variant of `make_view`, you can
replace [this
function](https://github.com/apache/arrow-rs/blob/master/arrow-array/src/builder/generic_bytes_view_builder.rs#L411)
with the following new implementation, and the performance will back to
normal. This new implementation makes 13 copies of make_view_inner; each copy
is a specialized length known to the compiler (`LEN` is const), and then the
compiler can optimize each of the implementations as needed. Looking at the
assembly code, there is indeed no call to `memcpy`.
```rust
fn make_view_inner<const LEN: usize>(data: &[u8]) -> u128 {
let mut view_buffer = [0; 16];
view_buffer[0..4].copy_from_slice(&(LEN as u32).to_le_bytes());
view_buffer[4..4 + LEN].copy_from_slice(&data[..LEN]);
u128::from_le_bytes(view_buffer)
}
/// Create a view based on the given data, block id and offset
#[inline(always)]
pub fn make_view(data: &[u8], block_id: u32, offset: u32) -> u128 {
let len = data.len() as u32;
match len {
0 => make_view_inner::<0>(data),
1 => make_view_inner::<1>(data),
2 => make_view_inner::<2>(data),
3 => make_view_inner::<3>(data),
4 => make_view_inner::<4>(data),
5 => make_view_inner::<5>(data),
6 => make_view_inner::<6>(data),
7 => make_view_inner::<7>(data),
8 => make_view_inner::<8>(data),
9 => make_view_inner::<9>(data),
10 => make_view_inner::<10>(data),
11 => make_view_inner::<11>(data),
12 => make_view_inner::<12>(data),
_ => {
let view = ByteView {
length: len,
prefix: u32::from_le_bytes(data[0..4].try_into().unwrap()),
buffer_index: block_id,
offset,
};
view.into()
}
}
}
```
(special thanks to @aoli-al for triangulating the root cause and prototype
the fix)
What should we do?
I don't know, I'm not sure if we want to merge that special `make_view`, as
it is very unintuitive. I'll keep a local copy of this `make_view` and think
about it in the background, maybe I'll have a better idea in a month.
<!--
There is no need to duplicate the description in the issue here but it is
sometimes worth providing a summary of the individual changes in this PR.
-->
# Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!---
If there are any breaking changes to public APIs, please add the `breaking
change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]