nuno-faria opened a new issue, #17196:
URL: https://github.com/apache/datafusion/issues/17196
### Describe the bug
In the following example, the Dynamic Filter Pushdown is built from `t2 (k,
v)` and correctly pushed to `t1 (k)`, to reduce the number of scanned rows from
`t1`:
```sql
select *
from t1 -- 1M rows, will become the right side
join t2 on t1.k = t2.k
where t2.v = 1000000; -- 1 row, will become the left side
HashJoinExec join_type=Inner build_time=78.306µs, join_time=22.402µs
DataSourceExec t2.parquet
predicate=v@1 = 1000000
output_rows=1
DataSourceExec t1.parquet
predicate=DynamicFilterPhysicalExpr [ k@0 >= 1000000 AND k@0 <=
1000000 ] -- dynamic filter added here
output_rows=1 -- only returns 1 row
```
Now if the `t2` side is greater than `t1`, then the order of the join
switches, as the smaller relation becomes the build side. However, the Dynamic
Filter Pushdown is still applied to the right relation (now `t2`) instead of
`t1`:
```sql
select *
from t1 -- 1M rows, will become the left side
join t2 on t1.k = t2.k
where t2.v >= 1000000; -- 9M rows, will become the right side
HashJoinExec join_type=Inner build_time=54.3462ms, join_time=2.403µs -- join
is more expensive
DataSourceExec t1.parquet
output_rows=1000000
DataSourceExec t2.parquet
predicate=v@1 >= 1000000 AND DynamicFilterPhysicalExpr [ k@0 >= 3
AND k@0 <= 999998 ] -- dynamic filter added here
output_rows=0 -- should be 1, possibly due to issue
https://github.com/apache/datafusion/issues/17188
```
### To Reproduce
```sql
copy (select i as k from generate_series(1, 1000000) t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 10000000) t(i)) to
't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
-- works
explain analyze select *
from t1
join t2 on t1.k = t2.k
where t2.v = 1000000;
-- pushed to the wrong relation
explain analyze select *
from t1
join t2 on t1.k = t2.k
where t2.v >= 1000000;
```
### Expected behavior
Dynamic Filter applied to the correct table.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]