Dandandan commented on code in PR #7364:
URL: https://github.com/apache/arrow-datafusion/pull/7364#discussion_r1303553019


##########
datafusion/sqllogictest/test_files/order.slt:
##########
@@ -410,3 +410,38 @@ SELECT DISTINCT time as "first_seen" FROM t ORDER BY 1;
 ## Cleanup
 statement ok
 drop table t;
+
+# Create a table having 3 columns which are ordering equivalent by the source. 
In the next step,
+# we will expect to observe the removed sort exec by propagating the orders 
across projection.
+statement ok
+CREATE EXTERNAL TABLE multiple_ordered_table (
+  a0 INTEGER,
+  a INTEGER,
+  b INTEGER,
+  c INTEGER,
+  d INTEGER
+)
+STORED AS CSV
+WITH HEADER ROW
+WITH ORDER (a ASC)
+WITH ORDER (b ASC)
+WITH ORDER (c ASC)
+LOCATION '../core/tests/data/window_2.csv';
+
+query TT
+EXPLAIN SELECT (b+a+c) AS result 
+FROM multiple_ordered_table
+ORDER BY result;
+----
+logical_plan
+Sort: result ASC NULLS LAST
+--Projection: multiple_ordered_table.b + multiple_ordered_table.a + 
multiple_ordered_table.c AS result
+----TableScan: multiple_ordered_table projection=[a, b, c]
+physical_plan
+SortPreservingMergeExec: [result@0 ASC NULLS LAST]

Review Comment:
   What I essentially mean is this:
   
   The batches of the table `multiple_ordered_table ` are ordered and order is 
preserved in `RepartitionExec` and `ProjectionExec`. If 
`SortPreservingMergeExec` would know the number of the batch (batch 0, batch 1, 
batch 2, etc.) it would only need to wait on batch 0, batch 1, batch 2, etc. to 
appear from the partition streams, but not the rows itself, which would be much 
faster.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to