nuno-faria opened a new issue, #17188: URL: https://github.com/apache/datafusion/issues/17188
### Describe the bug The Dynamic Filter Pushdown optimization (#16445) is causing joins to return incorrect results when the predicate has multiple conditions. ### To Reproduce ```sql copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet'; copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 't2.parquet'; create external table t1 stored as parquet location 't1.parquet'; create external table t2 stored as parquet location 't2.parquet'; ``` In the following query, sometimes only `v=1` is evaluated, while in others only `v=10000000` is evaluated: ```sql select * from t1 join t2 on t1.k = t2.k where v = 1 or v = 10000000; ``` We can see that the `DynamicFilterPhysicalExpr` can change: ``` +---+---+---+ | k | k | v | +---+---+---+ | 1 | 1 | 1 | +---+---+---+ predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 1 ] ``` ``` +----------+----------+----------+ | k | k | v | +----------+----------+----------+ | 10000000 | 10000000 | 10000000 | +----------+----------+----------+ predicate=DynamicFilterPhysicalExpr [ k@0 >= 10000000 AND k@0 <= 10000000 ] ``` If we use a smaller table, both `v=1` and `v=10000000` will be considered, so maybe the pushdown is not waiting until the filtered side is fully completed? Here is a bigger example to show how it changes: ```sql copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet'; copy (select i as k, i % 10000 as v from generate_series(1, 10000000) as t(i)) to 't2.parquet'; create external table t1 stored as parquet location 't1.parquet'; create external table t2 stored as parquet location 't2.parquet'; select * from t1 join t2 on t1.k = t2.k where v = 1 or v = 10; ``` And a few runs: ``` output_rows=1944 predicate=DynamicFilterPhysicalExpr [ k@0 >= 200010 AND k@0 <= 9910001 ] output_rows=1990 predicate=DynamicFilterPhysicalExpr [ k@0 >= 30010 AND k@0 <= 9970010 ] output_rows=1978 predicate=DynamicFilterPhysicalExpr [ k@0 >= 60010 AND k@0 <= 9920001 ] ``` ### Expected behavior Return the correct results. ### Additional context Disabling `datafusion.optimizer.enable_dynamic_filter_pushdown` returns the correct results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org