[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4156: Cleanup ChunkReader (#4118)

via GitHub Fri, 28 Apr 2023 13:48:21 -0700


tustvold commented on code in PR #4156:
URL: https://github.com/apache/arrow-rs/pull/4156#discussion_r1180807171



##########
parquet/src/file/reader.rs:
##########
@@ -44,19 +46,47 @@ pub trait Length {
 }
 
 /// The ChunkReader trait generates readers of chunks of a source.
-/// For a file system reader, each chunk might contain a clone of File bounded 
on a given range.
-/// For an object store reader, each read can be mapped to a range request.
+///
+/// For more information see [`File::try_clone`]
 pub trait ChunkReader: Length + Send + Sync {
-    type T: Read + Send;
-    /// Get a serially readable slice of the current reader
-    /// This should fail if the slice exceeds the current bounds
-    fn get_read(&self, start: u64, length: usize) -> Result<Self::T>;
+    type T: Read;
+
+    /// Get a [`Read`] starting at the provided file offset
+    ///
+    /// Subsequent or concurrent calls to [`Self::get_read`] or 
[`Self::get_bytes`] may

Review Comment:
   FileSource provided protection against subsequent calls to get_read, by 
calling Seek on every read, but provided no protection against concurrent 
access. I think it is less risky to just clearly not support non-serial usage, 
than to only break on concurrent usage.
   
   **TBC there are no safety implications of not synchronising this access**. 
You will just get interleaved reads, which is no different from just reading 
gibberish.
   
   One option would be to add `Mutex` to synchronise access, however, this 
solution is necessarily incomplete as a user can just call `File::try_clone`. 
Ultimately there is no reliable way to synchronise file IO, I think if no 
synchronisation is fine for the standard library, it is fine for the parquet 
crate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4156: Cleanup ChunkReader (#4118)

Reply via email to