tustvold commented on code in PR #4147:
URL: https://github.com/apache/arrow-rs/pull/4147#discussion_r1179721202


##########
parquet/src/file/reader.rs:
##########
@@ -43,13 +43,27 @@ pub trait Length {
     fn len(&self) -> u64;
 }
 
-/// The ChunkReader trait generates readers of chunks of a source.
-/// For a file system reader, each chunk might contain a clone of File bounded 
on a given range.
-/// For an object store reader, each read can be mapped to a range request.
+/// The ChunkReader trait provides synchronous access to contiguous byte 
ranges of a source
 pub trait ChunkReader: Length + Send + Sync {
     type T: Read + Send;
     /// Get a serially readable slice of the current reader
-    /// This should fail if the slice exceeds the current bounds
+    ///
+    /// # IO Granularity
+    ///
+    /// The `length` parameter provides an upper bound on the amount of bytes 
that
+    /// will be read, however, it is intended purely as a hint.
+    ///
+    /// It is not guaranteed that `length` bytes will actually be read, nor 
are any guarantees
+    /// made on the size of `length`, it may be as large as a row group or as 
small as a couple
+    /// of bytes. It therefore should not be used to solely determine the 
granularity of
+    /// IO to the underlying storage system.
+    ///
+    /// Systems looking to mask high-IO latency through prefetching, such as 
encountered with
+    /// object storage, should consider fetching the relevant byte ranges into 
[`Bytes`]

Review Comment:
   I more meant handling this outside of get_read, i.e. don't use ChunkReader 
for these use-cases 😅



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to