DDtKey opened a new issue, #9091:
URL: https://github.com/apache/arrow-datafusion/issues/9091

   ### Describe the bug
   
   I observe that the query result is not sorted correctly. And it worked 
before (datafusion 31.0.0 at least)
   
   ### To Reproduce
   
   ```sql
   CREATE TABLE companies(name VARCHAR, employees INT) AS VALUES ('Jeyork', 
150),('Shalk', 350),('ShuttlP', 75);
   CREATE TABLE leads(name VARCHAR, company VARCHAR) AS VALUES ('Alex F', 
'Jeyork'),('John B', 'Shalk'),('Samanta J', 'ShuttlP'),('Trevor R', 
'Jeyork'),('Alice B', 'ShuttlP');
   
   SELECT l.* FROM leads l LEFT JOIN companies c ON c."name" = l."company" 
ORDER BY c."name", l."name";
   +-----------+---------+
   | name      | company |
   +-----------+---------+
   | Alex F    | Jeyork  |
   | Alice B   | ShuttlP |
   | John B    | Shalk   |
   | Samanta J | ShuttlP |
   | Trevor R  | Jeyork  |
   +-----------+---------+
   5 rows in set. Query took 0.002 seconds.
   
   
   ```
   <details>
   <summary>EXPLAIN</summary>
   
   ```
   
+---------------+-------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                         |
   
+---------------+-------------------------------------------------------------------------------------+
   | logical_plan  | Projection: l.name, l.company                              
                         |
   |               |   Sort: c.name ASC NULLS LAST, l.name ASC NULLS LAST       
                         |
   |               |     Left Join: l.company = c.name                          
                         |
   |               |       SubqueryAlias: l                                     
                         |
   |               |         TableScan: leads projection=[name, company]        
                         |
   |               |       SubqueryAlias: c                                     
                         |
   |               |         TableScan: companies projection=[name]             
                         |
   | physical_plan | SortPreservingMergeExec: [name@0 ASC NULLS LAST,name@0 ASC 
NULLS LAST]              |
   |               |   SortExec: expr=[name@0 ASC NULLS LAST,name@0 ASC NULLS 
LAST]                      |
   |               |     ProjectionExec: expr=[name@1 as name, company@2 as 
company]                     |
   |               |       CoalesceBatchesExec: target_batch_size=8192          
                         |
   |               |         HashJoinExec: mode=Partitioned, join_type=Right, 
on=[(name@0, company@1)]   |
   |               |           CoalesceBatchesExec: target_batch_size=8192      
                         |
   |               |             RepartitionExec: partitioning=Hash([name@0], 
10), input_partitions=1    |
   |               |               MemoryExec: partitions=1, 
partition_sizes=[1]                         |
   |               |           CoalesceBatchesExec: target_batch_size=8192      
                         |
   |               |             RepartitionExec: 
partitioning=Hash([company@1], 10), input_partitions=1 |
   |               |               MemoryExec: partitions=1, 
partition_sizes=[1]                         |
   |               |                                                            
                         |
   
+---------------+-------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.009 seconds.
   
   ```
   </details>
   
   It should be sorted by `name` of table `company`, but we can see it's mixed.
   
   
   Interesting behavior is that **if I add `c.name` to the query, it will be 
sorted correctly!**
   
   ```sql
   SELECT l.*, c.name FROM leads l LEFT JOIN companies c ON c."name" = 
l."company" ORDER BY c."name", l."name";
   +-----------+---------+---------+
   | name      | company | name    |
   +-----------+---------+---------+
   | Alex F    | Jeyork  | Jeyork  |
   | Trevor R  | Jeyork  | Jeyork  |
   | John B    | Shalk   | Shalk   |
   | Alice B   | ShuttlP | ShuttlP |
   | Samanta J | ShuttlP | ShuttlP |
   +-----------+---------+---------+
   5 rows in set. Query took 0.002 seconds.
   ```
   
   <details>
   <summary>EXPLAIN</summary>
   
   ```
   
+---------------+-------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                         |
   
+---------------+-------------------------------------------------------------------------------------+
   | logical_plan  | Sort: c.name ASC NULLS LAST, l.name ASC NULLS LAST         
                         |
   |               |   Left Join: l.company = c.name                            
                         |
   |               |     SubqueryAlias: l                                       
                         |
   |               |       TableScan: leads projection=[name, company]          
                         |
   |               |     SubqueryAlias: c                                       
                         |
   |               |       TableScan: companies projection=[name]               
                         |
   | physical_plan | SortPreservingMergeExec: [name@2 ASC NULLS LAST,name@0 ASC 
NULLS LAST]              |
   |               |   SortExec: expr=[name@2 ASC NULLS LAST,name@0 ASC NULLS 
LAST]                      |
   |               |     ProjectionExec: expr=[name@1 as name, company@2 as 
company, name@0 as name]     |
   |               |       CoalesceBatchesExec: target_batch_size=8192          
                         |
   |               |         HashJoinExec: mode=Partitioned, join_type=Right, 
on=[(name@0, company@1)]   |
   |               |           CoalesceBatchesExec: target_batch_size=8192      
                         |
   |               |             RepartitionExec: partitioning=Hash([name@0], 
10), input_partitions=1    |
   |               |               MemoryExec: partitions=1, 
partition_sizes=[1]                         |
   |               |           CoalesceBatchesExec: target_batch_size=8192      
                         |
   |               |             RepartitionExec: 
partitioning=Hash([company@1], 10), input_partitions=1 |
   |               |               MemoryExec: partitions=1, 
partition_sizes=[1]                         |
   |               |                                                            
                         |
   
+---------------+-------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.002 seconds.
   ```
   </details>
   
   ### Expected behavior
   
   Order should be correct
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to