xudong963 edited a comment on pull request #1339:
URL: https://github.com/apache/arrow-datafusion/pull/1339#issuecomment-981117810
@alamb @Dandandan
### Case1
In Datafusion:
```sql
❯ select * from C;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
+---------+
2 rows in set. Query took 0.004 seconds.
❯ select * from D;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
+---------+
2 rows in set. Query took 0.004 seconds.
❯ select * from C LEFT JOIN D ON (C.column1 = D.column1 and D.column1>5);
+---------+---------+
| column1 | column1 |
+---------+---------+
| 2 | |
| 1 | |
+---------+---------+
2 rows in set. Query took 0.011 seconds.
select * from C LEFT JOIN D ON (C.column1 = D.column1 and C.column1>5);
NotImplemented("Unsupported expressions in Left JOIN: [#C.column1 >
Int64(5)]")
```
vs
In postgres:
```sql
postgres=# select * from C;
c
---
1
2
(2 rows)
postgres=# select * from D;
d
---
2
1
(2 rows)
postgres=# select * from C LEFT JOIN D ON (C.c = D.d and D.d>5);
c | d
---+---
1 |
2 |
(2 rows)
postgres=# select * from C LEFT JOIN D ON (C.c = D.d and C.c>5);
c | d
---+---
1 |
2 |
(2 rows)
```
If filters in the `On` clause are about the non preserved join side, we got
the same result between Datafusion and Postgresql. Notes that datafusion
doesn't support filters in the `On` clause are about the **preserved** join
side, so in the ticket let's put that aside for the moment.
---
### Case2 (contains null row)
In datafusion:
```sql
❯ SELECT * FROM table1;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
| |
+---------+
3 rows in set. Query took 0.005 seconds.
❯ SELECT * FROM table2;
+---------+
| column1 |
+---------+
| 1 |
| 3 |
| |
+---------+
3 rows in set. Query took 0.003 seconds.
❯
❯ SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
and table2.column1 IS NOT NULL;
+---------+---------+
| column1 | column1 |
+---------+---------+
| 2 | |
| | |
| 1 | 1 |
+---------+---------+
3 rows in set. Query took 0.011 seconds.
❯ SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
and table1.column1 IS NOT NULL;
NotImplemented("Unsupported expressions in Left JOIN: [#table1.column1 IS
NOT NULL]")
```
vs
In postgresql:
```sql
postgres=# SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
and table2.column1 IS NOT NULL;
column1 | column1
---------+---------
1 | 1
2 |
|
(3 rows)
postgres=# SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
and table1.column1 IS NOT NULL;
column1 | column1
---------+---------
1 | 1
2 |
|
(3 rows)
```
Case1 and Case2 show the same behavior, this is not a problem with IS NULL
and IS NOT NULL.
---
### Case 3
In datafusion:
```sql
❯ select * from C LEFT JOIN D ON (C.column1 = D.column1) where C.column1>5;
0 rows in set. Query took 0.013 seconds.
❯ select * from C LEFT JOIN D ON (C.column1 = D.column1) where D.column1>5;
0 rows in set. Query took 0.012 seconds.
```
In postgresql:
```sql
postgres=# select * from C LEFT JOIN D ON (C.c = D.d) where C.c>5;
c | d
---+---
(0 rows)
postgres=# select * from C LEFT JOIN D ON (C.c = D.d) where D.d>5;
c | d
---+---
(0 rows)
```
They are consistent.
---
Case4(contains null row)
In datafusion:
```sql
❯ SELECT * FROM table1;
+---------+
| column1 |
+---------+
| 1 |
| 2 |
| |
+---------+
3 rows in set. Query took 0.005 seconds.
❯ SELECT * FROM table2;
+---------+
| column1 |
+---------+
| 1 |
| 3 |
| |
+---------+
3 rows in set. Query took 0.003 seconds.
❯
❯ SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
WHERE table2.column1 IS NOT NULL;
+---------+---------+
| column1 | column1 |
+---------+---------+
| 2 | |
| 1 | 1 |
+---------+---------+
2 rows in set. Query took 0.013 seconds.
```
vs
In postgresql:
```sql
postgres=# SELECT *
FROM table1
LEFT JOIN table2
ON table1.column1 = table2.column1
WHERE table2.column1 IS NOT NULL;
column1 | column1
---------+---------
1 | 1
(1 row)
```
If the `WHERE` clause contains `NULL` expr, we shouldn't push down the
filter expr in `WHERE`.
```
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #table1.column1, #table2.column1
|
| | Join: #table1.column1 = #table2.column1
|
| | Filter: #table1.column1 IS NOT NULL
|
| | TableScan: table1 projection=Some([0])
|
| | Filter: #table2.column1 IS NOT NULL
|
| | TableScan: table2 projection=Some([0])
```
should be
```
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #table1.column1, #table2.column1
|
| | Filter: #table1.column1 IS NOT NULL
|
| | Join: #table1.column1 = #table2.column1
|
| | TableScan: table1 projection=Some([0])
|
| | TableScan: table2 projection=Some([0])
|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]