tustvold opened a new issue, #2293:
URL: https://github.com/apache/arrow-datafusion/issues/2293

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Part of #2079 
   
   Following on from #2292 and #2291 it should be possible to pull the 
multi-file handling out of each individual file operator, and delegate it to 
the physical plan. As described in #2079 this will greatly simplify the 
implementations, whilst also hiding fewer details from the physical plan.
   
   **Describe the solution you'd like**
   
   Currently a FileScanConfig would result `ListingTable::scan` generating a 
physical plan that looks something like
   
   ```
   ParquetExec
   ```
   
   I propose instead generating something like
   
   ```
   UnionExec
     ProjectionExec: ... // Partition 1
       SchemaAdapterExec
           ParquetExec: ... // Partition 1 File 1
       SchemaAdapterExec
           ParquetExec: ... // Partition 1 File 2
     ProjectionExec: ... // Partition 2
       SchemaAdapterExec
           ParquetExec: ... // Partition 2 File 1
       SchemaAdapterExec
           ParquetExec: ... // Partition 2 File 2
       SchemaAdapterExec
           ParquetExec: ... // Partition 2 File 3
   ```
   
   Whilst this is more complex, it results in less complexity in the file 
format operators, and should hopefully lead to less bugs due to things like 
#2170 or #2000
   
   **Describe alternatives you've considered**
   
   We could not do this
   
   FYI @thinkharderdev @matthewmturner 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to