etseidl commented on code in PR #6474:
URL: https://github.com/apache/arrow-rs/pull/6474#discussion_r1779726076


##########
parquet/src/file/metadata/reader.rs:
##########
@@ -299,11 +299,14 @@ impl ParquetMetaDataReader {
     /// See [`Self::with_prefetch_hint`] for a discussion of how to reduce the 
number of fetches
     /// performed by this function.
     #[cfg(feature = "async")]
-    pub async fn load_and_finish<F: MetadataFetch>(
+    pub async fn load_and_finish<F>(
         mut self,
-        fetch: F,
+        fetch: &mut F,
         file_size: usize,
-    ) -> Result<ParquetMetaData> {
+    ) -> Result<ParquetMetaData>
+    where
+        for<'a> &'a mut F: MetadataFetch,

Review Comment:
   > What don't you like about it? Is the idea that `MetadataFetch` actually 
has mutable state so that calling `fetch` would change its state?
   
   Some of it is my lack of Rust experience. I'm having a hard time tracking 
the ownership here, and am confusing traits with objects. Implementing the 
`MetadataFetch` trait for a mutable reference to something that implements the 
`AsyncFileReader` trait leaves my head spinning a little.
   
   > I think in general trying to get async functions to be happy with 
references is often much harder than with owned objects.
   
   Indeed, it took me days to get this to compile 😅.
   
   > 
   > So I guess I think we should keep the API as owned to make working with 
these APIs simpler, unless there is some compelling reason for the change 
(better performance, etc)
   
   I'll revert now that I understand the mechanics a little better. But one 
consequence is some things will get a little more verbose. For instance, when 
not taking ownership, `get_metadata` for `ParquetObjectReader` is
   ```rust
       fn get_metadata(&mut self, page_indexes: bool) -> BoxFuture<'_, 
Result<Arc<ParquetMetaData>>> {
           Box::pin(async move {
               let metadata = ParquetMetaDataReader::new()
                   .with_column_indexes(self.preload_column_index || 
page_indexes)
                   .with_offset_indexes(self.preload_offset_index || 
page_indexes)
                   .with_prefetch_hint(self.metadata_size_hint)
                   .load_and_finish(self, self.meta.size)
                   .await?;
               Ok(Arc::new(metadata))
           })
       }
   ```
   
   But if `load_and_finish` takes ownership of the mutable reference to `self`, 
we can't then get at `self.meta.size` in the same call, but need to save it 
earlier, IIUC.
   ```rust
       fn get_metadata(&mut self, page_indexes: bool) -> BoxFuture<'_, 
Result<Arc<ParquetMetaData>>> {
           Box::pin(async move {
               let file_size = self.meta.size;
               let metadata = ParquetMetaDataReader::new()
                   .with_column_indexes(self.preload_column_index || 
page_indexes)
                   .with_offset_indexes(self.preload_offset_index || 
page_indexes)
                   .with_prefetch_hint(self.metadata_size_hint)
                   .load_and_finish(self, file_size)
                   .await?;
               Ok(Arc::new(metadata))
           })
       }
   ```
   A small price to pay, but one I was hoping to avoid.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to