pchintar commented on code in PR #9778:
URL: https://github.com/apache/arrow-rs/pull/9778#discussion_r3127959522
##########
arrow-ipc/src/reader.rs:
##########
@@ -875,8 +875,12 @@ fn read_block<R: Read + Seek>(mut reader: R, block:
&Block) -> Result<Buffer, Ar
let metadata_len = block.metaDataLength().to_usize().unwrap();
let total_len = body_len.checked_add(metadata_len).unwrap();
- let mut buf = MutableBuffer::from_len_zeroed(total_len);
- reader.read_exact(&mut buf)?;
+ let mut buf = MutableBuffer::with_capacity(total_len);
+ // Buffer is immediately fully initialized by `read_exact` before any read
occurs
Review Comment:
Thanks @jhorstmann and @alamb — I revisited this and reworked my approach
to avoid `unsafe` entirely.
The earlier `set_len + read_exact` pattern is not sound for generic `Read`.
A fully safe alternative must also account for alignment: Arrow buffers require
64-byte alignment (`arrow_buffer::buffer::ALIGNMENT`), while `Vec<u8>` does not
guarantee this.
To handle both safely:
* read into `Vec<u8>` via `take(...).read_to_end(...)` (works for all `Read`)
* reuse the allocation only if it is 64-byte aligned
* otherwise copy into `MutableBuffer::from_len_zeroed(...)` to ensure proper
alignment
To avoid overhead on small reads, my implementation retains the existing
path for smaller buffer sizes and applies the optimized path only to larger
buffers (8192 bytes or higher), which constitute the majority of cases in the
benchmark.
### Benchmark (official `ipc_reader`)
```bash
cargo bench -p arrow-ipc --bench ipc_reader --features zstd
```
```text
StreamReader/read_10 ~910 µs → ~772 µs (~15% faster)
StreamReader/no_validation ~441 µs → ~300 µs (~32% faster)
FileReader/read_10 ~900 µs → ~773 µs (~14% faster)
FileReader/no_validation ~576 µs → ~300 µs (~48% faster)
zstd cases: small but consistent improvements
mmap: unchanged or slightly improved
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]