metesynnada opened a new pull request, #5754: URL: https://github.com/apache/arrow-datafusion/pull/5754
# Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/5715. # Rationale for this change The current implementation of SymmetricHashJoin requires order information before the PipelineFixer. This limitation results in unnecessary sort and distribution enforcement requirements that impact the optimizer's performance. To overcome this issue, we have revamped `SymmetricHashJoin` to eliminate the need for order information before PipelineFixer. # What changes are included in this PR? We have modified SymmetricHashJoin to function without requiring order information. The new implementation does not raise errors without order or filter information, though pruning is not supported without order information. Furthermore, we have set the required_input_ordering API to None, enabling the source to provide order information for piped executions. With this approach, we can genuinely remove the dependency on EnforceSorting and EnforceDistribution from PipelineFixer. # Are these changes tested? Yes. # Are there any user-facing changes? A new configuration option (`allow_symmetric_joins_without_pruning`) allows the user to constrain `SymmetricHashJoin`s to run only on ordered inputs—no breaking changes in existing use cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
