waynr opened a new issue, #14804:
URL: https://github.com/apache/datafusion/issues/14804

   ### Is your feature request related to a problem or challenge?
   
   In https://github.com/apache/arrow-rs/issues/7155 I've described a general 
need for the `ObjectStore` trait to be able to support passing contextual data 
to custom implementations. In https://github.com/apache/arrow-rs/pull/7160 I 
have implemented and approach to this by providing the ability for `GetOptions` 
to store opaque instances of values indexed by their `TypeId`, [similar to what 
is possible in datafusion with 
`SessionConfig`](https://docs.rs/datafusion/latest/datafusion/prelude/struct.SessionConfig.html#method.with_extension).
   
   This issue is about taking incorporating this new behavior(s) in 
`ObjectStore` and incorporating it here in datafusion such that the custom data 
on a `SessionConfig` is passed on when creating `GetOptions`s instances for 
retrieving files from an object store.
   
   ### Describe the solution you'd like
   
   I think the simplest approach here would be one where we create a new 
`ObjectStore` implementation during query execution that looks something like:
   
   ```
   struct ContextualizedObjectStore {
       inner: Arc<dyn ObjectStore>,
       extensions: object_store::Extensions,
   }
   ```
   
   We would then have a `get_opts` method for the `ObjectStore` impl trait that 
looks something like:
   
   ```
       async fn get_opts(
           &self,
           location: &Path,
           mut options: GetOptions,
       ) -> object_store::Result<GetResult> {
           options.extensions = self.extensions.clone();
           self.inner.get_opts(location, options).await
       }
   ```
   
   Initializing instances of this new type as a wrapper around whatever given 
`Arc<dyn ObjectStore>` is available would look something like:
   
   ```
           let object_store = context
               .runtime_env()
               .object_store(&self.object_store_url)
               .map(|inner: Arc<dyn ObjectStore>| -> Arc<dyn ObjectStore> {
                   Arc::new(ContextualizedObjectStore::new(
                       inner,
                       
context.session_config().clone_extensions_for_object_store(),
                   ))
               });
   ```
   
   With this approach, whenever the resulting `Arc<dyn ObjectStore>` is used to 
retrieve a file from object store, the underlying implementation would have 
access to the `object_store::Extensions` created from the `SessionConfig` 
extensions.
   
   ### Describe alternatives you've considered
   
   This is covered in https://github.com/apache/arrow-rs/issues/7155 and 
https://github.com/apache/arrow-rs/issues/7135.
   
   Basically, there are two alternative directions:
   
   * Update the `ObjectStore` API by providing optional trait methods that take 
an actual context type that can carry custom/extension data.
     * Considered by maintainers to be too heavy-handed.
   * Don't do anything.
     * This means for my use case, we wouldn't be able to properly parent 
tracing spans for object store accesses that happen during query execution.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to