sgrebnov commented on PR #13496:
URL: https://github.com/apache/datafusion/pull/13496#issuecomment-2492691910

   @jonathanc-n, @goldmedal  - I've reviewed this change and it brings back the 
following issue (there is additional context of why filtering added this way 
produces incorrect result)  https://github.com/apache/datafusion/pull/13132 . I 
really like the change but can we improve this to see if we can wrap TableScan 
with Filter as a subquery when it is required
   
   Example query.
   
   ## Original query / LogicalPlan / Result
   ```
   select
        c_custkey,
        count(o_orderkey)
   from
        customer left join orders on c_custkey = o_custkey and o_comment not 
like '%special%requests%'
   group by
        c_custkey
        
   ```
   
   ```console
   |               |  Projection: customer.c_custkey, count(orders.o_orderkey)  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                   |
   |               |   Aggregate: groupBy=[[customer.c_custkey]], 
aggr=[[count(orders.o_orderkey)]]                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                 |
   |               |     Left Join:  Filter: customer.c_custkey = 
orders.o_custkey                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                 |
   |               |       TableScan: customer projection=[c_custkey]           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                   |
   |               |       TableScan: orders projection=[o_orderkey, 
o_custkey], full_filters=[orders.o_comment NOT LIKE Utf8("%special%requests%")] 
 
   ```
   
   Result:
   ```
   1489 29
   1269 0
   652  24
   273  0
   51   0
   ...
   ```
   
   ## Existing unparser (main)
   
   ```
   select
        "customer"."c_custkey",
        count("orders"."o_orderkey")
   from
        "customer"
   left join "orders" on
        (("customer"."c_custkey" = "orders"."o_custkey")
                and "orders"."o_comment" not like '%special%requests%')
   group by
        "customer"."c_custkey"
   ```
   
   Result:
   ```
   1489 29
   1269 0
   652  24
   273  0
   51   0
   ...
   ```
   
   ## Proposed change
   
   ```
   select
        "customer"."c_custkey",
        count("orders"."o_orderkey")
   from
        "customer"
   left join "orders" on
        ("customer"."c_custkey" = "orders"."o_custkey")
   where
        "orders"."o_comment" not like '%special%requests%'
   group by
        "customer"."c_custkey"
   ```
   Result
   ```
   1489 29
   652  24
   1091 1
   70   15
   839  14
   ```
   
   If intent for filter to be moved it must be wrapped as subquery in this case:
   
   ```
   select
        c_custkey,
        count(o_orderkey)
   from
        customer left join (select * from orders where o_comment not like 
'%special%requests%') on c_custkey = o_custkey
   group by
        c_custkey
   ```
   
   Result
   ```
   1489 29
   1269 0
   652  24
   273  0
   51   0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to