akurmustafa commented on code in PR #12992:
URL: https://github.com/apache/datafusion/pull/12992#discussion_r1806736635


##########
datafusion/physical-expr/src/equivalence/ordering.rs:
##########
@@ -1065,4 +1066,63 @@ mod tests {
 
         Ok(())
     }
+
+    #[test]
+    fn test_ordering_satisfy_on_data() -> Result<()> {
+        let schema = create_test_schema()?;
+        let col_a = &col("a", &schema)?;
+        let col_b = &col("b", &schema)?;
+        let col_c = &col("c", &schema)?;
+        let col_d = &col("d", &schema)?;
+
+        let option_asc = SortOptions {
+            descending: false,
+            nulls_first: false,
+        };
+
+        let orderings = vec![
+            // [a ASC, b ASC, c ASC, d ASC]
+            vec![
+                (col_a, option_asc),
+                (col_b, option_asc),
+                (col_c, option_asc),
+                (col_d, option_asc),
+            ],
+            // [a ASC, c ASC, b ASC, d ASC]
+            vec![
+                (col_a, option_asc),
+                (col_c, option_asc),
+                (col_b, option_asc),
+                (col_d, option_asc),
+            ],
+        ];
+        let orderings = convert_to_orderings(&orderings);
+
+        let batch = generate_table_for_orderings(orderings, schema, 1000, 10)?;
+
+        // [a ASC, c ASC, d ASC] cannot be deduced
+        let ordering = vec![
+            (col_a, option_asc),
+            (col_c, option_asc),
+            (col_d, option_asc),
+        ];
+        let ordering = convert_to_orderings(&[ordering])[0].clone();
+        assert!(!is_table_same_after_sort(ordering, batch.clone())?);
+
+        // [a ASC, b ASC, d ASC] cannot be deduced
+        let ordering = vec![
+            (col_a, option_asc),
+            (col_b, option_asc),
+            (col_d, option_asc),
+        ];
+        let ordering = convert_to_orderings(&[ordering])[0].clone();
+        assert!(!is_table_same_after_sort(ordering, batch.clone())?);

Review Comment:
   This depends on the test batch generation parameters. For sufficiently large 
table sizes, and with enough cardinality it is really hard to hit this case. 
Hence, I can say that statistically this is very low possibility. However, we 
can totally encounter this for other use cases. Hence, if the expected result 
is counter intuitive, we should test the hypothesis with multiple different 
runs with various parameters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to