adriangb commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3198010057
@nuno-faria I was able to confirm this is just a race between updating the filter and starting work on the probe side: https://github.com/pydantic/datafusion/compare/fix-hash-join-partitioned...pydantic:datafusion:demo-race?expand=1 On that branch I get consistently get `20480` output rows on both sides of the join. But I don't think that is a good approach long term. The code is more complex and there are potentials for deadlocks. The way the current code is structured even if there are bugs it should never be slower than not having dynamic filters. They just may take a couple batches to kick in, they won't help small queries on a local SSD (like we've been testing here) much but will help massively for large queries on slower storage, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org