metesynnada opened a new pull request, #5754:
URL: https://github.com/apache/arrow-datafusion/pull/5754

   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/5715.
   
   # Rationale for this change
   
   The current implementation of SymmetricHashJoin requires order information 
before the PipelineFixer. This limitation results in unnecessary sort and 
distribution enforcement requirements that impact the optimizer's performance. 
To overcome this issue, we have revamped `SymmetricHashJoin` to eliminate the 
need for order information before PipelineFixer.
   
   # What changes are included in this PR?
   
   We have modified SymmetricHashJoin to function without requiring order 
information. The new implementation does not raise errors without order or 
filter information, though pruning is not supported without order information. 
Furthermore, we have set the required_input_ordering API to None, enabling the 
source to provide order information for piped executions. With this approach, 
we can genuinely remove the dependency on EnforceSorting and 
EnforceDistribution from PipelineFixer.
   
   # Are these changes tested?
   
   Yes.
   
   # Are there any user-facing changes?
   
   A new configuration option (`allow_symmetric_joins_without_pruning`) allows 
the user to constrain `SymmetricHashJoin`s to run only on ordered inputs—no 
breaking changes in existing use cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to