alamb commented on code in PR #5307: URL: https://github.com/apache/arrow-datafusion/pull/5307#discussion_r1108761565
########## datafusion/core/src/dataframe.rs: ########## @@ -1097,15 +1097,22 @@ mod tests { .unwrap() .distinct() .unwrap() - .sort(vec![col("c2").sort(true, true)]) + .sort(vec![col("c1").sort(true, true)]) .unwrap(); + let df_results = plan.clone().collect().await?; + + #[rustfmt::skip] Review Comment: This is a newly added test in https://github.com/apache/arrow-datafusion/pull/5258 If you look at the previous answers, it appears clearly incorrect to me -- there are duplicate values of `c1` produced. I updated the test to use c1 and avoid an error as well as added a test with sorting by c2 that shows the error ########## datafusion/core/tests/dataframe.rs: ########## @@ -138,7 +138,7 @@ async fn sort_on_distinct_unprojected_columns() -> Result<()> { Arc::new(schema.clone()), vec![ Arc::new(Int32Array::from_slice([1, 10, 10, 100])), - Arc::new(Int32Array::from_slice([2, 12, 12, 120])), + Arc::new(Int32Array::from_slice([2, 3, 4, 5])), Review Comment: This is also a newly added test -- and since the values of (a, b) are distinct where (a) was, it didn't show wrong results. If you change this data, prior to this PR it does show incorrect results I fixed the sort to be on a valid column and added a test for incorrect columns -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org