berkaysynnada commented on code in PR #12992: URL: https://github.com/apache/datafusion/pull/12992#discussion_r1806306901
########## datafusion/physical-expr/src/equivalence/ordering.rs: ########## @@ -1065,4 +1066,63 @@ mod tests { Ok(()) } + + #[test] + fn test_ordering_satisfy_on_data() -> Result<()> { + let schema = create_test_schema()?; + let col_a = &col("a", &schema)?; + let col_b = &col("b", &schema)?; + let col_c = &col("c", &schema)?; + let col_d = &col("d", &schema)?; + + let option_asc = SortOptions { + descending: false, + nulls_first: false, + }; + + let orderings = vec![ + // [a ASC, b ASC, c ASC, d ASC] + vec![ + (col_a, option_asc), + (col_b, option_asc), + (col_c, option_asc), + (col_d, option_asc), + ], + // [a ASC, c ASC, b ASC, d ASC] + vec![ + (col_a, option_asc), + (col_c, option_asc), + (col_b, option_asc), + (col_d, option_asc), + ], + ]; + let orderings = convert_to_orderings(&orderings); + + let batch = generate_table_for_orderings(orderings, schema, 1000, 10)?; + + // [a ASC, c ASC, d ASC] cannot be deduced + let ordering = vec![ + (col_a, option_asc), + (col_c, option_asc), + (col_d, option_asc), + ]; + let ordering = convert_to_orderings(&[ordering])[0].clone(); + assert!(!is_table_same_after_sort(ordering, batch.clone())?); + + // [a ASC, b ASC, d ASC] cannot be deduced + let ordering = vec![ + (col_a, option_asc), + (col_b, option_asc), + (col_d, option_asc), + ]; + let ordering = convert_to_orderings(&[ordering])[0].clone(); + assert!(!is_table_same_after_sort(ordering, batch.clone())?); Review Comment: IMO, if there are two identical tables, and one with sorted on `[a ASC, b ASC, c ASC, d ASC]` and `[a ASC, c ASC, b ASC, d ASC]`, and the other one sorted on [a ASC, c ASC, d ASC], the resulting tables can be again identical (moreover, it could be also possible for [a ASC, b ASC, d ASC] at the same time). If they come up so, the test will give error. Can we avoid it? ########## datafusion/physical-expr/src/equivalence/mod.rs: ########## @@ -446,11 +446,99 @@ mod tests { ( field.name(), // Generate random values for columns that do not occur in any of the groups (equivalence, ordering equivalence, constants) - elem.unwrap_or_else(|| generate_random_array(n_elem, n_distinct)), + elem.unwrap_or_else(|| { + generate_random_f64_array(n_elem, n_distinct, &mut rng) + }), ) }) .collect(); Ok(RecordBatch::try_from_iter(res)?) } + + // Helper function to get sort columns from a batch + fn get_sort_columns( + batch: &RecordBatch, + ordering: LexOrderingRef, + ) -> Result<Vec<SortColumn>> { + ordering + .iter() + .map(|expr| expr.evaluate_to_sort_column(batch)) + .collect::<Result<Vec<_>>>() + } + + // Utility function to generate random f64 array + fn generate_random_f64_array( + n_elems: usize, + n_distinct: usize, + rng: &mut StdRng, + ) -> ArrayRef { + let values: Vec<f64> = (0..n_elems) + .map(|_| rng.gen_range(0..n_distinct) as f64 / 2.0) Review Comment: Why `/ 2.0` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org