alamb commented on code in PR #8031:
URL: https://github.com/apache/arrow-rs/pull/8031#discussion_r2248075157
##########
parquet-variant/src/builder.rs:
##########
@@ -64,19 +64,55 @@ fn write_offset(buf: &mut Vec<u8>, value: usize, nbytes:
u8) {
buf.extend_from_slice(&bytes[..nbytes as usize]);
}
-/// Write little-endian integer to buffer at a specific position
-fn write_offset_at_pos(buf: &mut [u8], start_pos: usize, value: usize, nbytes:
u8) {
- let bytes = value.to_le_bytes();
- buf[start_pos..start_pos + nbytes as
usize].copy_from_slice(&bytes[..nbytes as usize]);
-}
-
/// Append `value_size` bytes of given `value` into `dest`.
fn append_packed_u32(dest: &mut Vec<u8>, value: u32, value_size: usize) {
let n = dest.len() + value_size;
dest.extend(value.to_le_bytes());
dest.truncate(n);
}
+/// An iterator that yields the bytes of a packed u32 iterator.
+/// Will yield the first `packed_bytes` bytes of each item in the iterator.
+struct PackedU32Iterator<T: Iterator<Item = [u8; 4]>> {
+ packed_bytes: usize,
Review Comment:
> However -- the temp buf append+truncate is fundamentally faster on a
per-entry basis, because of how much simpler it is at machine code level. And
it will run at exactly the same speed for all packing sizes, because there are
no branches or even conditional moves based on the packing size (**). The only
downside is the up-front cost of allocating said temp buffer first, which has
to amortize across all entries.
Yes I agree with this analysis.
This is why my opinion still remains that the way to avoid the allocation is
to calculate the header size and shift the bytes first so we write directly to
the target. I realize that is trickier code, but as @klion26 has said somewhere
else I think we could write some good assertions and tests to avoid problems
When I next get a chance to work on Variant related things with time, I want
to focus on unsticking the shredding implementation. Once that is done, I may
return here to try and restructure things
One thought I had was to encapsulate the logic / templating in a
```rust
struct ListHeaderWriter {
offsets: &[usize],
}
impl ListHeaderWriter {
fn header_size::<const OFFSET_SIZE:usize>(&self) -> usize {
// put the header size calculation here
}
// write the header into dst starting at start_offset
fn write(dst: &mut Vec<u8>, start_offset: usize)::<const
OFFSET_SIZE:usize>(&self) {
...
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]