alamb commented on issue #17169:
URL: https://github.com/apache/datafusion/issues/17169#issuecomment-3189076433

   I believe @Omega359 has also spoken about this usecase  being a challenge 
(basically deduplicating large datasets). 
   
   While DataFusion's grouping operator is already pretty optimized, it clearly 
could be improved for this particular case. 
   
   > For now I just wanted to capture the issue to see if it's known as I 
imagine it won't be an easy fix 😄
   
   I agree this is not likely to be an easy fix
   
   But frankly just having a good reproducer (as you have provided in this 
ticket) is a huge step forward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to