metesynnada commented on code in PR #7364: URL: https://github.com/apache/arrow-datafusion/pull/7364#discussion_r1303563273
########## datafusion/sqllogictest/test_files/order.slt: ########## @@ -410,3 +410,38 @@ SELECT DISTINCT time as "first_seen" FROM t ORDER BY 1; ## Cleanup statement ok drop table t; + +# Create a table having 3 columns which are ordering equivalent by the source. In the next step, +# we will expect to observe the removed sort exec by propagating the orders across projection. +statement ok +CREATE EXTERNAL TABLE multiple_ordered_table ( + a0 INTEGER, + a INTEGER, + b INTEGER, + c INTEGER, + d INTEGER +) +STORED AS CSV +WITH HEADER ROW +WITH ORDER (a ASC) +WITH ORDER (b ASC) +WITH ORDER (c ASC) +LOCATION '../core/tests/data/window_2.csv'; + +query TT +EXPLAIN SELECT (b+a+c) AS result +FROM multiple_ordered_table +ORDER BY result; +---- +logical_plan +Sort: result ASC NULLS LAST +--Projection: multiple_ordered_table.b + multiple_ordered_table.a + multiple_ordered_table.c AS result +----TableScan: multiple_ordered_table projection=[a, b, c] +physical_plan +SortPreservingMergeExec: [result@0 ASC NULLS LAST] Review Comment: To maintain batch indices for preserving order, it’s actually a good idea. The current sort preserving algorithm, designed to preserve the hash repartition, tends to overfit to the row sorting. We could potentially collaborate on implementing this optimization in future work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
