Re: [PR] Fix HashJoinExec sideways information passing for partitioned queries [datafusion]

via GitHub Mon, 18 Aug 2025 11:42:25 -0700


adriangb commented on PR #17197:
URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3198010057


   @nuno-faria I was able to confirm this is just a race between updating the 
filter and starting work on the probe side: 
https://github.com/pydantic/datafusion/compare/fix-hash-join-partitioned...pydantic:datafusion:demo-race?expand=1
   
   On that branch I get consistently get `20480` output rows on both sides of 
the join.
   
   But I don't think that is a good approach long term. The code is more 
complex and there are potentials for deadlocks. The way the current code is 
structured even if there are bugs it should never be slower than not having 
dynamic filters. They just may take a couple batches to kick in, they won't 
help small queries on a local SSD (like we've been testing here) much but will 
help massively for large queries on slower storage, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix HashJoinExec sideways information passing for partitioned queries [datafusion]

Reply via email to