comphead commented on PR #17242:
URL: https://github.com/apache/datafusion/pull/17242#issuecomment-3243650877

   Thanks @waynexia for the diagram and explanation. 
   Definitely agree for the simplification, abstractions indeed are overly 
flexible,  more than needed and getting this simplified would be awesome. For 
instance all the details related to specific datasource can be incapsulated in 
DataSource provider implementation.
   
   For example if user would like to onboard the Orc file, 
   
   
   For the diagram it might be still confusing having memory datasource under 
file source configs.
   File source are format dependent and thus having specific readers, configs 
per format, however memory have no dependency on the format, opener, etc. It 
should still depend on some memory scan config though. 
   
   Making some changes in the proposal 
   
   
   
   ```
                      ┌────────────────────┐
                      │   TableProvider    │
                      │ (File / Memory)    │
                      └─────────┬──────────┘
                                │
             ┌──────────────────┴──────────────────┐
             │                                     │
    ┌────────▼─────────┐                   ┌───────▼────────┐
    │ FileTableProvider│                   │ MemoryProvider │
    └────────┬─────────┘                   └────────┬───────┘
             │                                      │
             │ uses                                 │ uses
             ▼                                      ▼
      ┌───────────────────┐              ┌───────────────────┐
      │   FileScanConfig  │              │  MemoryScanConfig │
      └──────────┬────────┘              └──────────┬────────┘
                 │                                   │
                 └─────────────┬─────────────────────┘
                               │
                       ┌───────▼─────────┐
                       │   ScanConfig    │ (trait)
                       └───────┬─────────┘
                               │
           ┌───────────────────┼───────────────────┐
           │                   │                   │
      ┌────▼────┐        ┌─────▼────┐        ┌─────▼─────┐
      │FileFormat│        │FileOpener│        │FileStream │
      │(Parquet, │        └──────────┘        └───────────┘
      │ Avro,    │
      │ JSON, ..)│
      └──────────┘
   
   ```
   
   However as you correctly mentioned FileFormat, FileOpener, FileStream 
probably can be incapsulated into some facade object taking a config and 
providing `RecordBatchStream` hiding all the specifics inside
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to