simicd commented on code in PR #8857:
URL: https://github.com/apache/arrow-datafusion/pull/8857#discussion_r1451617218


##########
datafusion/sqllogictest/test_files/order.slt:
##########
@@ -578,3 +578,192 @@ SortPreservingMergeExec: [log_c12_base_c11@0 DESC]
 
 statement ok
 drop table aggregate_test_100;
+
+
+# Sort with lots of repetition values
+# Test sorting a parquet file with 2 million records that has lots of values 
that are repeated
+statement ok
+CREATE EXTERNAL TABLE repeat_much STORED AS PARQUET LOCATION 
'../../parquet-testing/data/repeat_much.snappy.parquet';
+
+query I
+SELECT a FROM repeat_much ORDER BY a LIMIT 20;
+----
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+2450962
+
+
+# Create external table with optional pre-known sort order
+# DataFusion may take advantage of this ordering to omit sorts or use more 
efficient algorithms.
+statement ok
+set datafusion.catalog.information_schema = true;
+
+statement ok
+CREATE EXTERNAL TABLE dt (a_id integer, a_str string, a_bool boolean) STORED 
AS CSV WITH ORDER (a_id ASC) LOCATION 'file://path/to/table';
+
+#TODO: How to check for order in sqllogictest?

Review Comment:
   Here I have a question: How can I check for pre-sorting with pure SQL in the 
sqllogictest? 
   
   The following is what the Rust unit test would assert but I couldn't find 
anything in the information schema that would return the `file_sort_order` 
setting.
   
https://github.com/apache/arrow-datafusion/blob/be361fdd8079a2f44da70f6af6e9d8eb3f7d0020/datafusion/core/tests/sql/order.rs#L60-L62



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to