adriangb commented on PR #21621: URL: https://github.com/apache/datafusion/pull/21621#issuecomment-4466063490
Hey @kumarUjjawal I'm afraid I don't have clear answers off the top of my head. > Sort(exprs, fetch=N) → Join → Sort(exprs, fetch=N), the outer Sort becomes redundant whenever the join is provably 1-to-≤1 on the preserved key (e.g. unique constraint on the other side, or upstream DISTINCT/GROUP BY on the join column). In that case the pushed Sort's ordering survives the join and the outer Sort + LIMIT could collapse to a plain Limit(N). This makes a lot of sense, I agree with this. > Does the FD info on Join's output schema already carry enough to detect this, or would it need more wiring? Sorry what is an FD? > Better as a new rule (EliminateRedundantOuterSort) or an extension to an existing one (EliminateLimit / a Sort-dedup pass)? I am not familiar with e.g. what EliminateLimit does. I think if it fits well in an existing rule that's best, but we shouldn't put two unrelated things in the same rule just because. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
