`SchemaProvider`

GitBox Mon, 28 Nov 2022 09:53:55 -0800


tustvold commented on issue #3777:
URL: 
https://github.com/apache/arrow-datafusion/issues/3777#issuecomment-1329512505


   > where the information cannot simply be stored in memory.
   
   Looking at the interface of `SchemaProvider` the only interface it needs is 
to provide access to `TableProvider` by name, it doesn't actually need any more 
information than this.
   
   The constraint then becomes, what information is needed to construct a 
`TableProvider`, which boils down to what information `TableProvider` needs to 
be able to provide. Currently this is just the schema, there is support for 
statistics but I'm not sure this is exploited anywhere.
   
   My question is therefore, **are there use-cases where the number of tables 
exceeds what can be stored in memory**? If not I don't see a compelling reason 
to make `SchemaProvider` async, we can potentially make `TableProvider` methods 
async to allow deferred loading of metadata, but `SchemaProvider` itself I 
think can remain sync?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on issue #3777: An asynchronous version of `CatalogList`/`CatalogProvider`/`SchemaProvider`

Reply via email to