DDtKey opened a new issue, #9091:
URL: https://github.com/apache/arrow-datafusion/issues/9091
### Describe the bug
I observe that the query result is not sorted correctly. And it worked
before (datafusion 31.0.0 at least)
### To Reproduce
```sql
CREATE TABLE companies(name VARCHAR, employees INT) AS VALUES ('Jeyork',
150),('Shalk', 350),('ShuttlP', 75);
CREATE TABLE leads(name VARCHAR, company VARCHAR) AS VALUES ('Alex F',
'Jeyork'),('John B', 'Shalk'),('Samanta J', 'ShuttlP'),('Trevor R',
'Jeyork'),('Alice B', 'ShuttlP');
SELECT l.* FROM leads l LEFT JOIN companies c ON c."name" = l."company"
ORDER BY c."name", l."name";
+-----------+---------+
| name | company |
+-----------+---------+
| Alex F | Jeyork |
| Alice B | ShuttlP |
| John B | Shalk |
| Samanta J | ShuttlP |
| Trevor R | Jeyork |
+-----------+---------+
5 rows in set. Query took 0.002 seconds.
```
<details>
<summary>EXPLAIN</summary>
```
+---------------+-------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-------------------------------------------------------------------------------------+
| logical_plan | Projection: l.name, l.company
|
| | Sort: c.name ASC NULLS LAST, l.name ASC NULLS LAST
|
| | Left Join: l.company = c.name
|
| | SubqueryAlias: l
|
| | TableScan: leads projection=[name, company]
|
| | SubqueryAlias: c
|
| | TableScan: companies projection=[name]
|
| physical_plan | SortPreservingMergeExec: [name@0 ASC NULLS LAST,name@0 ASC
NULLS LAST] |
| | SortExec: expr=[name@0 ASC NULLS LAST,name@0 ASC NULLS
LAST] |
| | ProjectionExec: expr=[name@1 as name, company@2 as
company] |
| | CoalesceBatchesExec: target_batch_size=8192
|
| | HashJoinExec: mode=Partitioned, join_type=Right,
on=[(name@0, company@1)] |
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([name@0],
10), input_partitions=1 |
| | MemoryExec: partitions=1,
partition_sizes=[1] |
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec:
partitioning=Hash([company@1], 10), input_partitions=1 |
| | MemoryExec: partitions=1,
partition_sizes=[1] |
| |
|
+---------------+-------------------------------------------------------------------------------------+
2 rows in set. Query took 0.009 seconds.
```
</details>
It should be sorted by `name` of table `company`, but we can see it's mixed.
Interesting behavior is that **if I add `c.name` to the query, it will be
sorted correctly!**
```sql
SELECT l.*, c.name FROM leads l LEFT JOIN companies c ON c."name" =
l."company" ORDER BY c."name", l."name";
+-----------+---------+---------+
| name | company | name |
+-----------+---------+---------+
| Alex F | Jeyork | Jeyork |
| Trevor R | Jeyork | Jeyork |
| John B | Shalk | Shalk |
| Alice B | ShuttlP | ShuttlP |
| Samanta J | ShuttlP | ShuttlP |
+-----------+---------+---------+
5 rows in set. Query took 0.002 seconds.
```
<details>
<summary>EXPLAIN</summary>
```
+---------------+-------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-------------------------------------------------------------------------------------+
| logical_plan | Sort: c.name ASC NULLS LAST, l.name ASC NULLS LAST
|
| | Left Join: l.company = c.name
|
| | SubqueryAlias: l
|
| | TableScan: leads projection=[name, company]
|
| | SubqueryAlias: c
|
| | TableScan: companies projection=[name]
|
| physical_plan | SortPreservingMergeExec: [name@2 ASC NULLS LAST,name@0 ASC
NULLS LAST] |
| | SortExec: expr=[name@2 ASC NULLS LAST,name@0 ASC NULLS
LAST] |
| | ProjectionExec: expr=[name@1 as name, company@2 as
company, name@0 as name] |
| | CoalesceBatchesExec: target_batch_size=8192
|
| | HashJoinExec: mode=Partitioned, join_type=Right,
on=[(name@0, company@1)] |
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([name@0],
10), input_partitions=1 |
| | MemoryExec: partitions=1,
partition_sizes=[1] |
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec:
partitioning=Hash([company@1], 10), input_partitions=1 |
| | MemoryExec: partitions=1,
partition_sizes=[1] |
| |
|
+---------------+-------------------------------------------------------------------------------------+
2 rows in set. Query took 0.002 seconds.
```
</details>
### Expected behavior
Order should be correct
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]