alamb commented on code in PR #5307:
URL: https://github.com/apache/arrow-datafusion/pull/5307#discussion_r1108761565
##########
datafusion/core/src/dataframe.rs:
##########
@@ -1097,15 +1097,22 @@ mod tests {
.unwrap()
.distinct()
.unwrap()
- .sort(vec![col("c2").sort(true, true)])
+ .sort(vec![col("c1").sort(true, true)])
.unwrap();
+
let df_results = plan.clone().collect().await?;
+
+ #[rustfmt::skip]
Review Comment:
This is a newly added test in
https://github.com/apache/arrow-datafusion/pull/5258
If you look at the previous answers, it appears clearly incorrect to me --
there are duplicate values of `c1` produced. I updated the test to use c1 and
avoid an error as well as added a test with sorting by c2 that shows the error
##########
datafusion/core/tests/dataframe.rs:
##########
@@ -138,7 +138,7 @@ async fn sort_on_distinct_unprojected_columns() ->
Result<()> {
Arc::new(schema.clone()),
vec![
Arc::new(Int32Array::from_slice([1, 10, 10, 100])),
- Arc::new(Int32Array::from_slice([2, 12, 12, 120])),
+ Arc::new(Int32Array::from_slice([2, 3, 4, 5])),
Review Comment:
This is also a newly added test -- and since the values of (a, b) are
distinct where (a) was, it didn't show wrong results. If you change this data,
prior to this PR it does show incorrect results
I fixed the sort to be on a valid column and added a test for incorrect
columns
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]