xudong963 commented on pull request #1339:
URL: https://github.com/apache/arrow-datafusion/pull/1339#issuecomment-981733521
> @xudong963 thank you for the examples
>
> Given this input:
>
> ```sql
> ❯ create table table1 as SELECT * FROM (VALUES (1), (2), (null)) as t;
> 0 rows in set. Query took 0.006 seconds.
> ❯ create table table2 as SELECT * FROM (VALUES (1), (3), (null)) as t;
> 0 rows in set. Query took 0.003 seconds.
> ```
>
> The plan that comes out is incorrect:
>
> ```sql
> ❯ explain SELECT *
> FROM table1
> LEFT JOIN table2
> ON table1.column1 = table2.column1
> WHERE table2.column1 IS NOT NULL;
>
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
> | plan_type | plan
|
>
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
> | logical_plan | Projection: #table1.column1, #table2.column1
|
> | | Join: #table1.column1 = #table2.column1
|
> | | Filter: #table1.column1 IS NOT NULL
|
> | | TableScan: table1 projection=Some([0])
|
> | | Filter: #table2.column1 IS NOT NULL
|
> | | TableScan: table2 projection=Some([0])
|
> ```
>
> I think the problem is this line:
>
> ```
> | | Filter: #table1.column1 IS NOT NULL
> ```
I changed my code according to your comments and got the following result
```sql
❯ explain SELECT * FROM table1 LEFT JOIN table2 ON table1.column1 =
table2.column1 WHERE table2.column1 IS NOT NULL;
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #table1.column1, #table2.column1
|
| | Join: #table1.column1 = #table2.column1
|
| | TableScan: table1 projection=Some([0])
|
| | Filter: #table2.column1 IS NOT NULL
|
| | TableScan: table2 projection=Some([0])
|
| physical_plan | ProjectionExec: expr=[column1@0 as column1, column1@1 as
column1] |
| | CoalesceBatchesExec: target_batch_size=4096
|
| | HashJoinExec: mode=Partitioned, join_type=Left,
on=[(Column { name: "column1", index: 0 }, Column { name: "column1", index: 0
})] |
| | CoalesceBatchesExec: target_batch_size=4096
|
| | RepartitionExec: partitioning=Hash([Column { name:
"column1", index: 0 }], 8) |
| | MemoryExec: partitions=8, partition_sizes=[1, 0,
0, 0, 0, 0, 0, 0] |
| | CoalesceBatchesExec: target_batch_size=4096
|
| | RepartitionExec: partitioning=Hash([Column { name:
"column1", index: 0 }], 8) |
| | CoalesceBatchesExec: target_batch_size=4096
|
| | FilterExec: column1@0 IS NOT NULL
|
| | MemoryExec: partitions=8,
partition_sizes=[1, 0, 0, 0, 0, 0, 0, 0]
|
| |
|
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.013 seconds.
❯ SELECT * FROM table1 LEFT JOIN table2 ON table1.column1 = table2.column1
WHERE table2.column1 IS NOT NULL;
+---------+---------+
| column1 | column1 |
+---------+---------+
| 2 | |
| | |
| 1 | 1 |
+---------+---------+
3 rows in set. Query took 0.014 seconds.
```
But postgresql's result is different
```sql
postgres=# SELECT * FROM table1 LEFT JOIN table2 ON table1.column1 =
table2.column1 WHERE table2.column1 IS NOT NULL
postgres-# ;
column1 | column1
---------+---------
1 | 1
(1 row)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]