pchintar commented on code in PR #9971:
URL: https://github.com/apache/arrow-rs/pull/9971#discussion_r3249928202
##########
arrow-ipc/src/reader.rs:
##########
@@ -902,16 +1107,29 @@ fn get_dictionary_values(
Ok(dictionary_values)
}
-/// Read the data for a given block
+/// Read the data for a given IPC file block.
+///
+/// The returned buffer is fully initialized by the reader. This avoids first
+/// zero-filling the allocation and then immediately overwriting it with block
+/// data.
fn read_block<R: Read + Seek>(mut reader: R, block: &Block) -> Result<Buffer,
ArrowError> {
reader.seek(SeekFrom::Start(block.offset() as u64))?;
let body_len = block.bodyLength().to_usize().unwrap();
let metadata_len = block.metaDataLength().to_usize().unwrap();
let total_len = body_len.checked_add(metadata_len).unwrap();
- let mut buf = MutableBuffer::from_len_zeroed(total_len);
- reader.read_exact(&mut buf)?;
- Ok(buf.into())
+ let mut buf = Vec::with_capacity(total_len);
+ reader
+ .by_ref()
+ .take(total_len as u64)
+ .read_to_end(&mut buf)?;
+ if buf.len() != total_len {
+ return Err(ArrowError::IpcError(format!(
+ "Expected IPC block of length {total_len}, got {}",
+ buf.len()
+ )));
+ }
+ Ok(Buffer::from_vec(buf))
Review Comment:
Yes, but this is always safe though & there's a net gain in speed
nonetheless, so for
aligned case:
old/current path: zero-fill + read
new path: read only
=> faster
unaligned case:
old/current path: zero-fill + read
new path: read + alignment copy
=> ~same almost
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]