jhorstmann commented on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-892847628


   > I tried to do this but I couldn't figure out how to do so using 
the`std::hash::Hash` API. I didn't find some way to return the hash value 
directly, only to update the intermediate value of a `Hasher`
   
   You're right, coming from a Java background I always forget that this works 
very differently in Rust.
   
   It should work with `insert_with_hasher` though since that callback gets the 
key as a parameter and returns an `u64`.
   
   > `insert_hashed_no_check` is a good one, though I think it still requires 
that consistency between `create_hash` and `ScalarValue::hash` for the case of 
collisions, right?
   
   Yes, that is my understanding. The method is probably really a bit dangerous 
since inconsistencies will then only show up when the map gets resized.
   
   > Yes that is a cool idea. I wonder if we could use something like that as 
an initial partial aggregate pass: we would initially aggregate each batch 
partially as you describe and then update the overall aggregates from the 
partials.
   
   That is also my idea, maybe not per batch, but to aggregate bigger 
partitions in parallel and then merging the results. The main benefit is not 
the hashing though, but that updating the accumulators can be done with a 
generic function instead of dynamic dispatch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to