tustvold commented on code in PR #4147:
URL: https://github.com/apache/arrow-rs/pull/4147#discussion_r1179721202
##########
parquet/src/file/reader.rs:
##########
@@ -43,13 +43,27 @@ pub trait Length {
fn len(&self) -> u64;
}
-/// The ChunkReader trait generates readers of chunks of a source.
-/// For a file system reader, each chunk might contain a clone of File bounded
on a given range.
-/// For an object store reader, each read can be mapped to a range request.
+/// The ChunkReader trait provides synchronous access to contiguous byte
ranges of a source
pub trait ChunkReader: Length + Send + Sync {
type T: Read + Send;
/// Get a serially readable slice of the current reader
- /// This should fail if the slice exceeds the current bounds
+ ///
+ /// # IO Granularity
+ ///
+ /// The `length` parameter provides an upper bound on the amount of bytes
that
+ /// will be read, however, it is intended purely as a hint.
+ ///
+ /// It is not guaranteed that `length` bytes will actually be read, nor
are any guarantees
+ /// made on the size of `length`, it may be as large as a row group or as
small as a couple
+ /// of bytes. It therefore should not be used to solely determine the
granularity of
+ /// IO to the underlying storage system.
+ ///
+ /// Systems looking to mask high-IO latency through prefetching, such as
encountered with
+ /// object storage, should consider fetching the relevant byte ranges into
[`Bytes`]
Review Comment:
I more meant handling this outside of get_read, i.e. don't use ChunkReader
for these use-cases 😅
Will see if I can't clarify the wording tomorrow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]