berkaysynnada commented on PR #6273:
URL:
https://github.com/apache/arrow-datafusion/pull/6273#issuecomment-1539944036
> Is there and SQL / plan level test that could be written that would cover
this code? It seems like a neat optimization but I don't fully understand how
to map it to how it would affect queries
This PR optimizes the plan such:
```
query TT
explain
SELECT c1 FROM(
(
SELECT c1 FROM t1
)
UNION ALL
(
SELECT c1a FROM t2
))
ORDER BY c1
----
logical_plan
Sort: t1.c1 ASC NULLS LAST
Union
TableScan: t1 projection=[c1]
Projection: t2.c1a AS t1.c1
TableScan: t2 projection=[c1a]
physical_plan
SortPreservingMergeExec: [c1@0 ASC NULLS LAST]
UnionExec
CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1],
output_ordering=[c1@0 ASC NULLS LAST], has_header=true
ProjectionExec: expr=[c1a@0 as t1.c1]
CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1a],
output_ordering=[c1a@0 ASC NULLS LAST], has_header=true
```
However, the main version results such:
```
logical_plan
Sort: t1.c1 ASC NULLS LAST
Union
TableScan: t1 projection=[c1]
Projection: t2.c1a AS t1.c1
TableScan: t2 projection=[c1a]
physical_plan
SortPreservingMergeExec: [c1@0 ASC NULLS LAST]
UnionExec
CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1],
output_ordering=[c1@0 ASC NULLS LAST], has_header=true
SortExec: expr=[c1@0 ASC NULLS LAST]
ProjectionExec: expr=[c1a@0 as t1.c1]
CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1a],
output_ordering=[c1a@0 ASC NULLS LAST], has_header=true
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]