tustvold commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1120419414
I think it is important to keep a separation between: * Catalog: what data files are where, what schema they have, what encoding they are, etc... * Data Access: how to get the data of a specific file In particular, there is a very common use case where an additional catalog is used to provide query performance, and by keeping the concerns separate we can ensure this is well supported. Currently I would view the catalog abstraction as `SchemaProvider`/`TableProvider`, and the data access as `ObjectStore`, but there is definitely potential to extract common catalog logic as suggested by @alamb is a good idea :+1: FWIW I created some tickets a while back on supporting external catalogs (e.g. https://github.com/apache/arrow-datafusion/issues/2206, https://github.com/apache/arrow-datafusion/issues/2208 and https://github.com/apache/arrow-datafusion/issues/2209) which may be relevant here. I also created tickets to make the file operators themselves less coupled with the catalog - https://github.com/apache/arrow-datafusion/issues/2291 and https://github.com/apache/arrow-datafusion/issues/2293. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
