[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1905: Avoid repeated `open` for one single file and simplify object reader API on the `sync` part

GitBox Wed, 02 Mar 2022 09:34:36 -0800


yjshen commented on a change in pull request #1905:
URL: https://github.com/apache/arrow-datafusion/pull/1905#discussion_r817933625




##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -39,27 +39,34 @@ use crate::error::{DataFusionError, Result};
 /// Note that the dynamic dispatch on the reader might
 /// have some performance impacts.
 #[async_trait]
-pub trait ObjectReader: Send + Sync {
+pub trait ObjectReader: Read + Seek + Send {
     /// Get reader for a part [start, start + length] in the file 
asynchronously
     async fn chunk_reader(&self, start: u64, length: usize)
         -> Result<Box<dyn AsyncRead>>;
 
-    /// Get reader for a part [start, start + length] in the file
-    fn sync_chunk_reader(
-        &self,
-        start: u64,
-        length: usize,
-    ) -> Result<Box<dyn Read + Send + Sync>>;
-
-    /// Get reader for the entire file
-    fn sync_reader(&self) -> Result<Box<dyn Read + Send + Sync>> {
-        self.sync_chunk_reader(0, self.length() as usize)
-    }

Review comment:
       After discussions with @houqp and @richox, we agreed that the extra 
`chunk` semantic introduced in ObjectReader introduces irrelevant file format 
details to object stores and incurs needless complexity. Therefore the API 
simplifications.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1905: Avoid repeated `open` for one single file and simplify object reader API on the `sync` part

Reply via email to