crepererum commented on issue #5325:
URL: 
https://github.com/apache/arrow-datafusion/issues/5325#issuecomment-1441476287

   > Update size incrementally for upcoming batch only -> doesn't seem to be a 
solution as we do not know in advance which hashes already counted and which 
are not. Expenses on calculating is higher than benefit.
   
   Is the problem the `self.values.insert(...)` bit? `HashSet` sadly doesn't 
have an entry API (IIRC), but you can use a `HashMap<K, ()>` instead so you 
don't need to double-hash.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to