avantgardnerio commented on PR #3955:
URL: 
https://github.com/apache/arrow-datafusion/pull/3955#issuecomment-1290713704

   > there's some discussion before
   
   @yahoNanJing I am not familiar with the discussion (or don't recall)... do 
you have a link to the PR or github issue?
   
   > wondering why it's changed back to async
   
   Because [creating 
TableProviders](https://github.com/apache/arrow-datafusion/blob/7559c4425e6f32655c6d09e8ed17c9c51896472b/datafusion/core/src/execution/context.rs#L432)
 may have to be an async operation for ones like Deltalake that need to go load 
schema from the network.
   
   I looked into the alternative: having two methods on `TableProviderFactory`:
   
   1. `fn async create(url: String)` - what we have now
   2. `fn with_schema(url: String, schema: SchemaRef)` so that in theory when 
deserializing `TableProviders` we could skip the async operation by using the 
schema which should have been serialized in the scan.
   
   Unfortunately, it did not look trivial to serialize all the state that a 
`DeltaTable` sets up during planning, so based on @andygrove 's suggestion I 
switched it to async.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to