neilconway opened a new issue, #22930:
URL: https://github.com/apache/datafusion/issues/22930

   ### Is your feature request related to a problem or challenge?
   
   #22914 means that we will no longer produce intermediate duplicate output 
rows for `RightSemi`, but we still store duplicate build-side rows in the hash 
join build side. We could consider eliminating that. This would reduce hash 
join memory consumption, but the tradeoff is that we might do some wasted work 
if we spend time eliminating dups that would never participate in the join in 
the first place. Merits some further study; we could perhaps make this 
conditional on the fraction of duplicate hash values we observe on the build 
side as we execute the operator.
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to