alamb commented on pull request #1339:
URL: https://github.com/apache/arrow-datafusion/pull/1339#issuecomment-981570431


   @xudong963  thank you for the examples
   
   Given this input:
   ```sql
   ❯ create table table1 as SELECT * FROM (VALUES (1), (2), (null)) as t;
   0 rows in set. Query took 0.006 seconds.
   ❯ create table table2 as SELECT * FROM (VALUES (1), (3), (null)) as t;
   0 rows in set. Query took 0.003 seconds.
   ```
   
   The plan that comes out is incorrect:
   
   ```sql
   ❯ explain SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   WHERE table2.column1 IS NOT NULL;
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                           |
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #table1.column1, #table2.column1               
                                                                           |
   |               |   Join: #table1.column1 = #table2.column1                  
                                                                           |
   |               |     Filter: #table1.column1 IS NOT NULL                    
                                                                           |
   |               |       TableScan: table1 projection=Some([0])               
                                                                           |
   |               |     Filter: #table2.column1 IS NOT NULL                    
                                                                           |
   |               |       TableScan: table2 projection=Some([0])               
                                                                           |
   
   ```
   
   I think the problem is this line:
   ```
   |               |     Filter: #table1.column1 IS NOT NULL                    
                                                                           |
   ```
   
   Namely something has added a predicate to `table1.column1` (where the 
original query has only a predicate on `table2`).
   
   The same problem happens for `>5` type predicates too:
   
   ```sql
   explain SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   WHERE table2.column1 > 5;
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                           |
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #table1.column1, #table2.column1               
                                                                           |
   |               |   Join: #table1.column1 = #table2.column1                  
                                                                           |
   |               |     Filter: #table1.column1 > Int64(5)                     
                                                                           |
   |               |       TableScan: table1 projection=Some([0])               
                                                                           |
   |               |     Filter: #table2.column1 > Int64(5)                     
                                                                           |
   |               |       TableScan: table2 projection=Some([0])               
                                                                           
   ```
   
   The equivalent plan from postgres only pushes the filter down for `table2` 
(there is no filter on `table1.column1`):
   
   ```sql
   alamb=# explain SELECT *
   FROM table1
   LEFT JOIN table2
       ON table1.column1 = table2.column1
   WHERE table2.column1 > 5;
                                 QUERY PLAN                              
   ----------------------------------------------------------------------
    Merge Join  (cost=263.02..429.84 rows=10838 width=8)
      Merge Cond: (table2.column1 = table1.column1)
      ->  Sort  (cost=83.23..85.36 rows=850 width=4)
            Sort Key: table2.column1
            ->  Seq Scan on table2  (cost=0.00..41.88 rows=850 width=4)
                  Filter: (column1 > 5)
      ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
            Sort Key: table1.column1
            ->  Seq Scan on table1  (cost=0.00..35.50 rows=2550 width=4)
   (9 rows)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to