neilconway commented on PR #22652: URL: https://github.com/apache/datafusion/pull/22652#issuecomment-4638905090
@Dandandan Q23 doesn't result in a plan change and I can't repro that locally; I think that one is just noise. Q14 does indeed repro. The plan _should_ strictly be an improvement (just swapping inner joins for semi-joins); it regresses because `RightSemi` joins are actually _slower_ than inner joins in DF right now :) We basically compute an inner join and then do an extra pass to eliminate any duplicates (`get_semi_indices`). Some ideas: (1) We could pass down a flag indicating that the join is unique on the join keys (case 2a described above), so we can skip the duplicate removal. This _should_ get semi-joins back to parity with inner joins. (2) Alternatively, we could write a specialized join kernel for semi and anti joins. This has the opportunity to be a significant win, not just get back to parity. I'm inclined to pursue (2), taking a look at that now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
