[GitHub] [arrow-datafusion] tustvold commented on issue #2445: ObjectStore Directory Semantics


tustvold commented on issue #2445:
URL: 
https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1120419414


   I think it is important to keep a separation between:
   
   * Catalog: what data files are where, what schema they have, what encoding 
they are, etc...
   * Data Access: how to get the data of a specific file
   
   In particular, there is a very common use case where an additional catalog 
is used to provide query performance, and by keeping the concerns separate we 
can ensure this is well supported.
   
   Currently I would view the catalog abstraction as 
`SchemaProvider`/`TableProvider`, and the data access as `ObjectStore`, but 
there is definitely potential to extract common catalog logic as suggested by 
@alamb is a good idea :+1:
   
   FWIW I created some tickets a while back on supporting external catalogs 
(e.g. https://github.com/apache/arrow-datafusion/issues/2206, 
https://github.com/apache/arrow-datafusion/issues/2208 and 
https://github.com/apache/arrow-datafusion/issues/2209) which may be relevant 
here. I also created tickets to make the file operators themselves less coupled 
with the catalog - https://github.com/apache/arrow-datafusion/issues/2291 and 
https://github.com/apache/arrow-datafusion/issues/2293.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on issue #2445: ObjectStore Directory Semantics

Reply via email to