adriangb commented on PR #17197:
URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3193804053

   @nuno-faria thank you for your patience. I can reproduce now, I must have 
been on the wrong commit or something. My laptop started having issues a couple 
days ago so I've been resetting it, will probably need to get a new one, and 
must have gotten confused about what commit I was on or something.
   
   In any case, I think I found the root cause the both issues: empty 
partitions (with no rows in the hash table / build side) do not report bounds 
-> were not being counted towards cross partition synchronization -> dynamic 
filters were not being built at the right time / at all. I think 
d2b8da555aec13fed1c0995baae32fc0e8ba9069 should fix this.
   
   Now with the following `q.sql`:
   
   ```sql
   copy (select i as k from generate_series(1, 10000000) as t(i)) to 
't1.parquet';
   copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 
't2.parquet';
   create external table t1 stored as parquet location 't1.parquet';
   create external table t2 stored as parquet location 't2.parquet';
   
   explain analyze select *
   from t1
   join t2 on t1.k = t2.k
   where v = 1;
   ```
   
   When I run `cargo run -p datafusion-cli -- -f q.sql` I consistently get 
`metrics=[output_rows=20480`.
   
   And with this query:
   
   ```sql
   explain analyze select *
   from t1
   join t2 on t1.k = t2.k
   where v = 1 or v = 2;
   ```
   
   I consistently get `DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 2 ]`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to