alamb commented on a change in pull request #984:
URL: https://github.com/apache/arrow-datafusion/pull/984#discussion_r705557467



##########
File path: datafusion/src/physical_plan/sort_preserving_merge.rs
##########
@@ -1156,10 +1156,24 @@ mod tests {
     #[tokio::test]
     async fn test_async() {
         let schema = test::aggr_test_schema();
-        let sort = vec![PhysicalSortExpr {
-            expr: col("c7", &schema).unwrap(),
-            options: SortOptions::default(),
-        }];
+        let sort = vec![

Review comment:
       This is required to ensure a consistent sort between the two sorting 
strategies due to the change to use unstable sorting, introduced in 
https://github.com/apache/arrow-rs/pull/552. The `SortPreservingMerge` operator 
happens to be a stable sort but the sort kernel used by the `Sort` Operator no 
longer is.
   
   The issue here is that the original sort key, `c7` has duplicated values as 
can be seen in this screenshot (e.g the value `18` is repeated in several rows):
   ![Screen Shot 2021-09-09 at 1 21 05 
PM](https://user-images.githubusercontent.com/490673/132734478-3e57144a-addc-422c-9e96-27b3239ce20e.png)
   
   The full output is in  
[basic.txt](https://github.com/apache/arrow-datafusion/files/7138649/basic.txt)
   and  
[partition.txt](https://github.com/apache/arrow-datafusion/files/7138650/partition.txt)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to