rdettai edited a comment on issue #1220:
URL: 
https://github.com/apache/arrow-datafusion/issues/1220#issuecomment-957591340


   #1185 is also a follow up to #1139 that is closely related to this. Maybe we 
can merge the two issues and create subtasks?
   
   @alamb my idea was that each standard/technique for getting the list of 
files (table catalog) should be a different provider. The listing provider 
might handle folder structures that are slightly different from the hive one 
(e.g `mytable/2021/11/02` instead of `mytable/year=2021/month=11/day=02`), but 
it focuses on setups where the partitions are encoded in the folder structure 
itself and are discovered by **"listing"** the file system. Most of the code 
inside the `datasource/listing` module should be specialized to do precisely 
that (e.g chose a listing strategy, parse the paths...). Everything else can be 
taken out and mutualized into a common module for reuse in other table 
providers 😊.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to