sidd328 opened a new issue, #18145:
URL: https://github.com/apache/datafusion/issues/18145

   ### Is your feature request related to a problem or challenge?
   
   As a user I would like the ability to construct my own 
CachedParquetFileReader. This is so that we can provide our own inner (object 
reader) with specific index preloads. So for example, with the 
non-CachedParquetFileReader (for a regular ```ParquetFileReaderFactory``` I can 
do this in the ```create_reader``` function):
   
   ```
   let store = Arc::clone(&self.object_store);
   let mut reader = ParquetObjectReader::new(store, 
file_meta.object_meta.location)
       .with_file_size(file_meta.object_meta.size)
       .with_preload_column_index(true)
       .with_preload_offset_index(true);
   
   if let Some(hint) = metadata_size_hint {
       reader = reader.with_footer_size_hint(hint)
   };
   Ok(Box::new(reader))
   ```
   
   And for CachedParquetReader I would like the ability to do this:
   ```
   let store = Arc::clone(&self.store);
   let mut inner = ParquetObjectReader::new(store, 
file_meta.object_meta.location.clone())
       .with_file_size(file_meta.object_meta.size)
       .with_preload_column_index(true)
       .with_preload_offset_index(true);
   
   if let Some(hint) = metadata_size_hint {
       inner = inner.with_footer_size_hint(hint)
   };
   
   Ok(Box::new(CachedParquetFileReader {
       store: Arc::clone(&self.store),
       inner,
       file_metrics,
       file_meta,
       metadata_cache: Arc::clone(&self.metadata_cache),
   }))
   ```
   
   But since CachedParquetFileReader has private fields and does not have a 
```new``` method or constructor, I cannot create my own CachedParquetFileReader 
with preload index. 
   
   ### Describe the solution you'd like
   
   Adjust the CachedParquetFileReader to add a ```new``` method or constructor. 
i.e., to 
https://github.com/apache/datafusion/blob/main/datafusion/datasource-parquet/src/reader.rs
 file we can add a method to the CachedParquetReader like so:
   
   ```
   impl CachedParquetFileReader {
       pub fn new(
           file_metrics: ParquetFileMetrics,
           store: Arc<dyn ObjectStore>,
           inner: ParquetObjectReader,
           file_meta: FileMeta,
           metadata_cache: Arc<dyn FileMetadataCache>
       ) -> Self {
           Self {
               file_metrics,
               store,
               inner,
               file_meta,
               metadata_cache
           }
       }
   }
   ```
   
   Other option is to make all fields public but I believe the ```new``` method 
might be cleaner (I don't have context on why some fields are private I'm 
afraid)
   
   ### Describe alternatives you've considered
   
   Current workaround is to have our own ```CachedParquetFileReader``` with 
copied AsyncFileReader implementation.
   
   Its not the worst workaround but has us maintaining our own 
```CachedParquetFileReader``` which is not ideal. 
   
   ### Additional context
   
   I can submit a PR for these changes since its a pretty small change but 
would like someone to verify if there are any issues with my logic or there are 
any other concerns with adding a constructor 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to