berkaysynnada commented on PR #6273:
URL: 
https://github.com/apache/arrow-datafusion/pull/6273#issuecomment-1539944036

   > Is there and SQL / plan level test that could be written that would cover 
this code? It seems like a neat optimization but I don't fully understand how 
to map it to how it would affect queries
   
   This PR optimizes the plan such:
   ```
   query TT
   explain
   SELECT c1 FROM(
   (   
       SELECT c1 FROM t1
   )  
   UNION ALL
   (   
       SELECT c1a FROM t2
   ))
   ORDER BY c1
   ----
   logical_plan
     Sort: t1.c1 ASC NULLS LAST
       Union
         TableScan: t1 projection=[c1]
         Projection: t2.c1a AS t1.c1
           TableScan: t2 projection=[c1a]
   physical_plan
     SortPreservingMergeExec: [c1@0 ASC NULLS LAST]
       UnionExec
         CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1], 
output_ordering=[c1@0 ASC NULLS LAST], has_header=true
           ProjectionExec: expr=[c1a@0 as t1.c1]
         CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1a], 
output_ordering=[c1a@0 ASC NULLS LAST], has_header=true
   ```
   
   However, the main version results such:
   ```
   logical_plan
     Sort: t1.c1 ASC NULLS LAST
       Union
         TableScan: t1 projection=[c1]
         Projection: t2.c1a AS t1.c1
           TableScan: t2 projection=[c1a]
   physical_plan
     SortPreservingMergeExec: [c1@0 ASC NULLS LAST]
       UnionExec
         CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1], 
output_ordering=[c1@0 ASC NULLS LAST], has_header=true
         SortExec: expr=[c1@0 ASC NULLS LAST]
           ProjectionExec: expr=[c1a@0 as t1.c1]
             CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1a], 
output_ordering=[c1a@0 ASC NULLS LAST], has_header=true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to