tustvold opened a new issue, #2293:
URL: https://github.com/apache/arrow-datafusion/issues/2293
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
Part of #2079
Following on from #2292 and #2291 it should be possible to pull the
multi-file handling out of each individual file operator, and delegate it to
the physical plan. As described in #2079 this will greatly simplify the
implementations, whilst also hiding fewer details from the physical plan.
**Describe the solution you'd like**
Currently a FileScanConfig would result `ListingTable::scan` generating a
physical plan that looks something like
```
ParquetExec
```
I propose instead generating something like
```
UnionExec
ProjectionExec: ... // Partition 1
SchemaAdapterExec
ParquetExec: ... // Partition 1 File 1
SchemaAdapterExec
ParquetExec: ... // Partition 1 File 2
ProjectionExec: ... // Partition 2
SchemaAdapterExec
ParquetExec: ... // Partition 2 File 1
SchemaAdapterExec
ParquetExec: ... // Partition 2 File 2
SchemaAdapterExec
ParquetExec: ... // Partition 2 File 3
```
Whilst this is more complex, it results in less complexity in the file
format operators, and should hopefully lead to less bugs due to things like
#2170 or #2000
**Describe alternatives you've considered**
We could not do this
FYI @thinkharderdev @matthewmturner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]