metesynnada commented on code in PR #7364:
URL: https://github.com/apache/arrow-datafusion/pull/7364#discussion_r1303563273


##########
datafusion/sqllogictest/test_files/order.slt:
##########
@@ -410,3 +410,38 @@ SELECT DISTINCT time as "first_seen" FROM t ORDER BY 1;
 ## Cleanup
 statement ok
 drop table t;
+
+# Create a table having 3 columns which are ordering equivalent by the source. 
In the next step,
+# we will expect to observe the removed sort exec by propagating the orders 
across projection.
+statement ok
+CREATE EXTERNAL TABLE multiple_ordered_table (
+  a0 INTEGER,
+  a INTEGER,
+  b INTEGER,
+  c INTEGER,
+  d INTEGER
+)
+STORED AS CSV
+WITH HEADER ROW
+WITH ORDER (a ASC)
+WITH ORDER (b ASC)
+WITH ORDER (c ASC)
+LOCATION '../core/tests/data/window_2.csv';
+
+query TT
+EXPLAIN SELECT (b+a+c) AS result 
+FROM multiple_ordered_table
+ORDER BY result;
+----
+logical_plan
+Sort: result ASC NULLS LAST
+--Projection: multiple_ordered_table.b + multiple_ordered_table.a + 
multiple_ordered_table.c AS result
+----TableScan: multiple_ordered_table projection=[a, b, c]
+physical_plan
+SortPreservingMergeExec: [result@0 ASC NULLS LAST]

Review Comment:
   To maintain batch indices for preserving order, it’s actually a good idea. 
The current sort preserving algorithm, designed to preserve the hash 
repartition, tends to overfit to the row sorting. We could potentially 
collaborate on implementing this optimization in future work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to