comphead commented on PR #11218: URL: https://github.com/apache/datafusion/pull/11218#issuecomment-2237811875
> Thank you @comphead and @viirya > > I think this code is now correct, though I also think it could be improved (both with the comments from @viirya , my suggestion in [comphead#297](https://github.com/comphead/arrow-datafusion/pull/297) as well as more testing) > > Specifically, for testing, given the subtlety of the code involved I am not 100% sure it works for all corner cases. I suggest (as a follow on) we invest in fuzz testing both for SMJ in general as well as for spilling SMJ > > https://github.com/apache/datafusion/blob/6c0e4fb5d9ac7a0a2f2b91f8b88d21f0bc0b4424/datafusion/core/tests/fuzz_cases/join_fuzz.rs#L50-L49 > > I think in particular, making sure we adjust the random inputs to have different numbers of repeated values (as the code in this PR is only going to be exercised when there are many of the same join keys I think) Filed #11541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
