JanKaul commented on issue #3777:
URL: 
https://github.com/apache/arrow-datafusion/issues/3777#issuecomment-1330163584

   I think the biggest issue with the making `SchemaProvider` async is that it 
makes query planning async. The async gets introduced to the query planner in 
the `SessionContext` implementation of `SessionState`:
   
   ```rust
   impl ContextProvider for SessionState {
       fn get_table_provider(&self, name: TableReference) -> Result<Arc<dyn 
TableSource>> {
           let resolved_ref = self.resolve_table_ref(name);
           match self.schema_for_ref(resolved_ref) {
               Ok(schema) => {
                   let provider = 
schema.table(resolved_ref.table).ok_or_else(|| { // <= async would be 
introduced here
                       DataFusionError::Plan(format!(
                           "table '{}.{}.{}' not found",
                           resolved_ref.catalog, resolved_ref.schema, 
resolved_ref.table
                       ))
                   })?;
                   Ok(provider_as_source(provider))
               }
               Err(e) => Err(e),
           }
       }
       ...
   }
   ```
   
   And I don't see a clean solution to make `ContextProvider` sync while 
`SchemaProvider` being async. If you are pre-fetching the tables you could do 
it already in the `SchemaProvider`.
   
   One approach could be to have an in memory mirror of the `SchemaProvider` 
that stores only the essential data. And everytime its state is changed it 
schedules an async task to update the actual storage without awaiting the 
response. This however could lead to data races if multiple users access the 
storage simultaneously.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to