Copilot commented on code in PR #530:
URL: https://github.com/apache/avro-rs/pull/530#discussion_r3037518274
##########
avro/src/reader/mod.rs:
##########
@@ -159,6 +164,39 @@ impl<R: Read> Iterator for Reader<'_, R> {
}
}
+impl<R: Read + Seek> Reader<'_, R> {
+ /// The currently loaded block's position and record count.
+ ///
+ /// Returns `None` before the first record has been read.
+ /// Updated automatically each time a new block is loaded during iteration.
+ pub fn current_block(&self) -> Option<BlockPosition> {
+ self.block.current_block_info
+ }
Review Comment:
The doc comment for `current_block()` says it returns `None` “before the
first record has been read”, but `seek_to_block()` loads a block eagerly
(calling `read_block_next`), so after a successful seek this will return
`Some(...)` even before reading any records. Please clarify the semantics in
the docs (e.g., “None before the first block is loaded via iteration or
`seek_to_block`).”
##########
avro/src/reader/block.rs:
##########
@@ -35,10 +35,57 @@ use crate::{
util,
};
+/// Position and size of a single Avro data block within the file stream.
+///
+/// Captured automatically as blocks are read during forward iteration.
+/// Use with [`super::Reader::seek_to_block`] to jump back to a
previously-read block.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct BlockPosition {
+ /// Byte offset in the stream where this block starts (before the
object-count varint).
+ pub offset: u64,
+ /// Total number of records in this block.
+ pub message_count: usize,
+}
Review Comment:
`BlockPosition`’s doc comment says it captures “Position and size of a
single Avro data block”, but the struct currently only contains `offset` and
`message_count` (record count), not the block’s byte size. Consider adjusting
the docs to match the fields (or add a byte-size field if that was intended).
##########
avro/src/reader/mod.rs:
##########
@@ -159,6 +164,39 @@ impl<R: Read> Iterator for Reader<'_, R> {
}
}
+impl<R: Read + Seek> Reader<'_, R> {
+ /// The currently loaded block's position and record count.
+ ///
+ /// Returns `None` before the first record has been read.
+ /// Updated automatically each time a new block is loaded during iteration.
+ pub fn current_block(&self) -> Option<BlockPosition> {
+ self.block.current_block_info
+ }
+
+ /// Byte offset where data blocks begin (right after the file header).
+ ///
+ /// This is the offset of the first data block — equivalent to the position
+ /// that would be returned by `current_block().offset` for block 0.
+ pub fn data_start(&self) -> u64 {
+ self.block.data_start
+ }
Review Comment:
`current_block()` / `data_start()` are only available when `R: Read + Seek`,
but the implementation now tracks byte offsets via `PositionTracker` even for
non-seekable readers. Unless there’s a deliberate API reason to hide block
position info for non-seekable inputs, consider moving these accessors to the
`impl<R: Read>` block and keeping only `seek_to_block()` gated on `Seek`.
##########
avro/src/reader/block.rs:
##########
@@ -295,6 +355,23 @@ impl<'r, R: Read> Block<'r, R> {
}
}
+impl<R: Read + Seek> Block<'_, R> {
+ /// Seek the underlying stream to `offset` and read the block there.
+ /// Validates the sync marker to confirm it's a real block boundary.
+ pub(super) fn seek_to_block(&mut self, offset: u64) -> AvroResult<()> {
+ self.reader
+ .seek(SeekFrom::Start(offset))
+ .map_err(Details::SeekToBlock)?;
+
+ self.buf.clear();
+ self.buf_idx = 0;
+ self.message_count = 0;
+ self.current_block_info = None;
+
+ self.read_block_next()
Review Comment:
`seek_to_block` can succeed even when `offset` is not a valid block start
(e.g., an offset at/after EOF). In that case `read_block_next()` treats
`UnexpectedEof` as a clean end-of-stream and returns `Ok(())`, leaving the
reader empty, which contradicts the method’s contract (“must point to the start
of a valid data block”). Consider making `seek_to_block` return an error when
no block header can be read at the requested offset (or add a dedicated
“invalid block offset / unexpected EOF while seeking” error variant).
```suggestion
self.read_block_next()?;
if self.current_block_info.is_none() {
return Err(std::io::Error::new(
ErrorKind::UnexpectedEof,
format!("offset {offset} does not point to a valid block
start"),
)
.into());
}
Ok(())
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]