xudong963 edited a comment on pull request #1339:
URL: https://github.com/apache/arrow-datafusion/pull/1339#issuecomment-981117810


   @alamb @Dandandan 
   ### Case1
   In Datafusion:
   
   ```sql
   ❯ select * from C;
   +---------+
   | column1 |
   +---------+
   | 1       |
   | 2       |
   +---------+
   2 rows in set. Query took 0.004 seconds.
   ❯ select * from D;
   +---------+
   | column1 |
   +---------+
   | 1       |
   | 2       |
   +---------+
   2 rows in set. Query took 0.004 seconds.
   
   ❯  select * from C LEFT JOIN D ON (C.column1 = D.column1 and D.column1>5);
   +---------+---------+
   | column1 | column1 |
   +---------+---------+
   | 2       |         |
   | 1       |         |
   +---------+---------+
   2 rows in set. Query took 0.011 seconds.
   
   select * from C LEFT JOIN D ON (C.column1 = D.column1 and C.column1>5);
   NotImplemented("Unsupported expressions in Left JOIN: [#C.column1 > 
Int64(5)]")
   ```
   
   vs
   
   In postgres:
   
   ```sql
   postgres=# select * from C;
    c
   ---
    1
    2
   (2 rows)
   
   postgres=# select * from D;
    d
   ---
    2
    1
   (2 rows)
   
   postgres=# select * from C LEFT JOIN D ON (C.c = D.d and D.d>5);
    c | d
   ---+---
    1 |
    2 |
   (2 rows)
   
   postgres=# select * from C LEFT JOIN D ON (C.c = D.d and C.c>5);
    c | d
   ---+---
    1 |
    2 |
   (2 rows)
   ```
   
   If filters in the `On` clause are about the non preserved join side, we got 
the same result between Datafusion and Postgresql. Notes that datafusion 
doesn't support filters in the `On` clause are about the **preserved** join 
side, so in the ticket let's put that aside for the moment.
   
   ---
   
   ### Case2 (contains null row)
   In datafusion:
   ```sql
   ❯ SELECT * FROM table1;
   +---------+
   | column1 |
   +---------+
   | 1       |
   | 2       |
   |         |
   +---------+
   3 rows in set. Query took 0.005 seconds.
   
   ❯ SELECT * FROM table2;
   +---------+
   | column1 |
   +---------+
   | 1       |
   | 3       |
   |         |
   +---------+
   3 rows in set. Query took 0.003 seconds.
   ❯
   
   ❯ SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   and table2.column1 IS NOT NULL;
   +---------+---------+
   | column1 | column1 |
   +---------+---------+
   | 2       |         |
   |         |         |
   | 1       | 1       |
   +---------+---------+
   3 rows in set. Query took 0.011 seconds.
   
   ❯ SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   and table1.column1 IS NOT NULL;
   NotImplemented("Unsupported expressions in Left JOIN: [#table1.column1 IS 
NOT NULL]")
   
   ```
   
   vs
   
   In postgresql:
   
   ```sql
   postgres=# SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   and table2.column1 IS NOT NULL;
    column1 | column1
   ---------+---------
          1 |       1
          2 |
            |
   (3 rows)
   
   postgres=# SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   and table1.column1 IS NOT NULL;
    column1 | column1
   ---------+---------
          1 |       1
          2 |
            |
   (3 rows)
   ```
   Case1 and Case2 show the same behavior, this is not a problem with IS NULL 
and IS NOT NULL.
   
   ---
   
   ### Case 3
   In datafusion:
   
   ```sql
   ❯  select * from C LEFT JOIN D ON (C.column1 = D.column1) where C.column1>5;
   0 rows in set. Query took 0.013 seconds.
   
   ❯  select * from C LEFT JOIN D ON (C.column1 = D.column1) where D.column1>5;
   0 rows in set. Query took 0.012 seconds.
   ```
   
   In postgresql:
   ```sql
   postgres=# select * from C LEFT JOIN D ON (C.c = D.d) where  C.c>5;
    c | d
   ---+---
   (0 rows)
   
   postgres=# select * from C LEFT JOIN D ON (C.c = D.d) where  D.d>5;
    c | d
   ---+---
   (0 rows)
   ```
   They are consistent.
   
   ---
   
   Case4(contains null row)
   
   In datafusion:
   ```sql
   ❯ SELECT * FROM table1;
   +---------+
   | column1 |
   +---------+
   | 1       |
   | 2       |
   |         |
   +---------+
   3 rows in set. Query took 0.005 seconds.
   
   ❯ SELECT * FROM table2;
   +---------+
   | column1 |
   +---------+
   | 1       |
   | 3       |
   |         |
   +---------+
   3 rows in set. Query took 0.003 seconds.
   ❯
   
   ❯ SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   WHERE table2.column1 IS NOT NULL;
   +---------+---------+
   | column1 | column1 |
   +---------+---------+
   | 2       |         |
   | 1       | 1       |
   +---------+---------+
   2 rows in set. Query took 0.013 seconds.
   ```
   vs
   
   In postgresql:
   
   ```sql
   postgres=# SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   WHERE table2.column1 IS NOT NULL;
    column1 | column1
   ---------+---------
          1 |       1
   (1 row)
   ```
   If the `WHERE` clause contains `NULL` expr, we shouldn't push down the 
filter expr in `WHERE`.
   
   ```
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                           |
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #table1.column1, #table2.column1               
                                                                           |
   |               |   Join: #table1.column1 = #table2.column1                  
                                                                           |
   |               |     Filter: #table1.column1 IS NOT NULL                    
                                                                           |
   |               |       TableScan: table1 projection=Some([0])               
                                                                           |
   |               |     Filter: #table2.column1 IS NOT NULL                    
                                                                           |
   |               |       TableScan: table2 projection=Some([0])
   ```
   
   should be
   
   ```
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                               |
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #table1.column1, #table2.column1               
                                                                               |
   |               |   Filter: #table1.column1 IS NOT NULL                      
                                                                               |
   |               |     Join: #table1.column1 = #table2.column1                
                                                                               |
   |               |       TableScan: table1 projection=Some([0])               
                                                                               |
   |               |       TableScan: table2 projection=Some([0])               
                                                                               |
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to