adriangb commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3193804053
@nuno-faria thank you for your patience. I can reproduce now, I must have been on the wrong commit or something. My laptop started having issues a couple days ago so I've been resetting it, will probably need to get a new one, and must have gotten confused about what commit I was on or something. In any case, I think I found the root cause the both issues: empty partitions (with no rows in the hash table / build side) do not report bounds -> were not being counted towards cross partition synchronization -> dynamic filters were not being built at the right time / at all. I think d2b8da555aec13fed1c0995baae32fc0e8ba9069 should fix this. Now with the following `q.sql`: ```sql copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet'; copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 't2.parquet'; create external table t1 stored as parquet location 't1.parquet'; create external table t2 stored as parquet location 't2.parquet'; explain analyze select * from t1 join t2 on t1.k = t2.k where v = 1; ``` When I run `cargo run -p datafusion-cli -- -f q.sql` I consistently get `metrics=[output_rows=20480`. And with this query: ```sql explain analyze select * from t1 join t2 on t1.k = t2.k where v = 1 or v = 2; ``` I consistently get `DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 2 ]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org