JanKaul commented on issue #3777:
URL:
https://github.com/apache/arrow-datafusion/issues/3777#issuecomment-1330163584
I think the biggest issue with the making `SchemaProvider` async is that it
makes query planning async. The async gets introduced to the query planner in
the `SessionContext` implementation of `SessionState`:
```rust
impl ContextProvider for SessionState {
fn get_table_provider(&self, name: TableReference) -> Result<Arc<dyn
TableSource>> {
let resolved_ref = self.resolve_table_ref(name);
match self.schema_for_ref(resolved_ref) {
Ok(schema) => {
let provider =
schema.table(resolved_ref.table).ok_or_else(|| { // <= async would be
introduced here
DataFusionError::Plan(format!(
"table '{}.{}.{}' not found",
resolved_ref.catalog, resolved_ref.schema,
resolved_ref.table
))
})?;
Ok(provider_as_source(provider))
}
Err(e) => Err(e),
}
}
...
}
```
And I don't see a clean solution to make `ContextProvider` sync while
`SchemaProvider` being async. If you are pre-fetching the tables you could do
it already in the `SchemaProvider`.
One approach could be to have an in memory mirror of the `SchemaProvider`
that stores only the essential data. And everytime its state is changed it
schedules an async task to update the actual storage without awaiting the
response. This however could lead to data races if multiple users access the
storage simultaneously.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]