[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5307: Allow DISTINCT with ORDER BY and an aliased select list

via GitHub Thu, 16 Feb 2023 09:02:18 -0800


alamb commented on code in PR #5307:
URL: https://github.com/apache/arrow-datafusion/pull/5307#discussion_r1108761565



##########
datafusion/core/src/dataframe.rs:
##########
@@ -1097,15 +1097,22 @@ mod tests {
             .unwrap()
             .distinct()
             .unwrap()
-            .sort(vec![col("c2").sort(true, true)])
+            .sort(vec![col("c1").sort(true, true)])
             .unwrap();
+
         let df_results = plan.clone().collect().await?;
+
+        #[rustfmt::skip]

Review Comment:
   This is a newly added test in 
https://github.com/apache/arrow-datafusion/pull/5258 
   
   If you look at the previous answers, it appears clearly incorrect to me -- 
there are duplicate values of `c1` produced. I updated the test to use c1 and 
avoid an error as well as added a test with sorting by c2 that shows the error



##########
datafusion/core/tests/dataframe.rs:
##########
@@ -138,7 +138,7 @@ async fn sort_on_distinct_unprojected_columns() -> 
Result<()> {
         Arc::new(schema.clone()),
         vec![
             Arc::new(Int32Array::from_slice([1, 10, 10, 100])),
-            Arc::new(Int32Array::from_slice([2, 12, 12, 120])),
+            Arc::new(Int32Array::from_slice([2, 3, 4, 5])),

Review Comment:
   This is also a newly added test -- and since the values of (a, b) are 
distinct where (a) was, it didn't show wrong results. If you change this data, 
prior to this PR it does show incorrect results
   
   I fixed the sort to be on a valid column and added a test for incorrect 
columns



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5307: Allow DISTINCT with ORDER BY and an aliased select list

Reply via email to