tustvold opened a new issue, #2992:
URL: https://github.com/apache/arrow-datafusion/issues/2992

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   We are seeing a number of projects with differing requirements for how the 
interaction with object store and parquet should proceed:
   
   * Fetching multiple byte ranges in parallel - 
https://github.com/apache/arrow-datafusion/issues/2949
   * Fetching data from sources that aren't typical object stores - 
https://github.com/apache/arrow-rs/issues/2230#issuecomment-1200144042
   
   Something clearly isn't right here, and it's creating friction preventing 
users from getting things working. 
   
   **Describe the solution you'd like**
   
   The general philosophy of DataFusion is to be pluggable, and allow for easy 
extension where the defaults are not applicable to the use-case. This is 
particularly important for the interfaces to data storage, where a lot of 
application-specific trade-offs will occur.
   
   I would therefore like to propose adding an option to `ParquetExec` to 
specify `ParquetOpenFn` (name to be discussed).
   
   ```
   type ParquetOpenFn = Box<dyn Fn(ObjectMeta) -> Result<Box<dyn 
AsyncFileReader>>>
   ```
   
   This will be called by `ParquetOpener` to construct the `AsyncFileReader` 
passed to `ParquetRecordBatchStream`
   
   By default this would simply construct a `ParquetFileReader` as currently, 
but the user would be able to override this with a custom implementation as 
desired. This would allow:
   
   * Interacting with ObjectStore differently - #2949 
   * Calling out to something that isn't even an ObjectStore such as a custom 
tiered storage engine - 
https://github.com/apache/arrow-rs/issues/2230#issuecomment-1200144042
   * Almost certainly something else
   
   Thoughts @thinkharderdev @Cheappie @alamb @crepererum ?
   
   **Describe alternatives you've considered**
   
   We could not do this
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to