nuno-faria opened a new issue, #17188:
URL: https://github.com/apache/datafusion/issues/17188

   ### Describe the bug
   
   The Dynamic Filter Pushdown optimization (#16445) is causing joins to return 
incorrect results when the predicate has multiple conditions.
   
   ### To Reproduce
   
   ```sql
   copy (select i as k from generate_series(1, 10000000) as t(i)) to 
't1.parquet';
   copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 
't2.parquet';
   create external table t1 stored as parquet location 't1.parquet';
   create external table t2 stored as parquet location 't2.parquet';
   ```
   
   In the following query, sometimes only `v=1` is evaluated, while in others 
only `v=10000000` is evaluated:
   ```sql
   select *
   from t1
   join t2 on t1.k = t2.k
   where v = 1 or v = 10000000;
   ```
   
   We can see that the `DynamicFilterPhysicalExpr` can change:
   ```
   +---+---+---+
   | k | k | v |
   +---+---+---+
   | 1 | 1 | 1 |
   +---+---+---+
   
   predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 1 ]
   ```
   
   ```
   +----------+----------+----------+
   | k        | k        | v        |
   +----------+----------+----------+
   | 10000000 | 10000000 | 10000000 |
   +----------+----------+----------+
   
   predicate=DynamicFilterPhysicalExpr [ k@0 >= 10000000 AND k@0 <= 10000000 ]
   ```
   
   If we use a smaller table, both `v=1` and `v=10000000` will be considered, 
so maybe the pushdown is not waiting until the filtered side is fully 
completed? Here is a bigger example to show how it changes:
   
   ```sql
   copy (select i as k from generate_series(1, 10000000) as t(i)) to 
't1.parquet';
   copy (select i as k, i % 10000 as v from generate_series(1, 10000000) as 
t(i)) to 't2.parquet';
   
   create external table t1 stored as parquet location 't1.parquet';
   create external table t2 stored as parquet location 't2.parquet';
   
   select *
   from t1
   join t2 on t1.k = t2.k
   where v = 1 or v = 10;
   ```
   
   And a few runs:
   ```
   output_rows=1944
   predicate=DynamicFilterPhysicalExpr [ k@0 >= 200010 AND k@0 <= 9910001 ]
   
   output_rows=1990
   predicate=DynamicFilterPhysicalExpr [ k@0 >= 30010 AND k@0 <= 9970010 ]
   
   output_rows=1978
   predicate=DynamicFilterPhysicalExpr [ k@0 >= 60010 AND k@0 <= 9920001 ]
   ```
   
   ### Expected behavior
   
   Return the correct results.
   
   ### Additional context
   
   Disabling `datafusion.optimizer.enable_dynamic_filter_pushdown` returns the 
correct results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to