nuno-faria commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3199697422
> @nuno-faria I was able to confirm this is just a race between updating the filter and starting work on the probe side: https://github.com/pydantic/datafusion/compare/fix-hash-join-partitioned...pydantic:datafusion:demo-race?expand=1 I see, that makes sense. I think in theory we would need to have something like what Postgres does and determine at plan time that following the parameterized path would be the best approach, which would be quite complex. >But I don't think that is a good approach long term. The code is more complex and there are potentials for deadlocks. The way the current code is structured even if there are bugs it should never be slower than not having dynamic filters. They just may take a couple batches to kick in, they won't help small queries on a local SSD (like we've been testing here) much but will help massively for large queries on slower storage, etc. Agreed. I did some tests and found that the number of rows is kept to the minimuim when the number of partitions is set to 1. On a simple join query this makes it more than 20x faster than DuckDB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org