nuno-faria opened a new issue, #17196:
URL: https://github.com/apache/datafusion/issues/17196

   ### Describe the bug
   
   In the following example, the Dynamic Filter Pushdown is built from `t2 (k, 
v)` and correctly pushed to `t1 (k)`, to reduce the number of scanned rows from 
`t1`:
   
   ```sql
   select *
   from t1 -- 1M rows, will become the right side
   join t2 on t1.k = t2.k
   where t2.v = 1000000; -- 1 row, will become the left side
   
   HashJoinExec join_type=Inner build_time=78.306µs, join_time=22.402µs
       DataSourceExec t2.parquet
           predicate=v@1 = 1000000
           output_rows=1
       DataSourceExec t1.parquet
           predicate=DynamicFilterPhysicalExpr [ k@0 >= 1000000 AND k@0 <= 
1000000 ] -- dynamic filter added here
           output_rows=1 -- only returns 1 row
   ```
   
   Now if the `t2` side is greater than `t1`, then the order of the join 
switches, as the smaller relation becomes the build side. However, the Dynamic 
Filter Pushdown is still applied to the right relation (now `t2`) instead of 
`t1`:
   
   ```sql
   select *
   from t1 -- 1M rows, will become the left side
   join t2 on t1.k = t2.k
   where t2.v >= 1000000; -- 9M rows, will become the right side
   
   HashJoinExec join_type=Inner build_time=54.3462ms, join_time=2.403µs -- join 
is more expensive
       DataSourceExec t1.parquet
           output_rows=1000000
       DataSourceExec t2.parquet
           predicate=v@1 >= 1000000 AND DynamicFilterPhysicalExpr [ k@0 >= 3 
AND k@0 <= 999998 ] -- dynamic filter added here
           output_rows=0 -- should be 1, possibly due to issue 
https://github.com/apache/datafusion/issues/17188
   ```
   
   ### To Reproduce
   
   ```sql
   copy (select i as k from generate_series(1, 1000000) t(i)) to 't1.parquet';
   copy (select i as k, i as v from generate_series(1, 10000000) t(i)) to 
't2.parquet';
   create external table t1 stored as parquet location 't1.parquet';
   create external table t2 stored as parquet location 't2.parquet';
   
   -- works
   explain analyze select *
   from t1
   join t2 on t1.k = t2.k
   where t2.v = 1000000;
   
   -- pushed to the wrong relation
   explain analyze select *
   from t1
   join t2 on t1.k = t2.k
   where t2.v >= 1000000;
   ```
   
   ### Expected behavior
   
   Dynamic Filter applied to the correct table.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to