jhorstmann commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-892847628
> I tried to do this but I couldn't figure out how to do so using the`std::hash::Hash` API. I didn't find some way to return the hash value directly, only to update the intermediate value of a `Hasher` You're right, coming from a Java background I always forget that this works very differently in Rust. It should work with `insert_with_hasher` though since that callback gets the key as a parameter and returns an `u64`. > `insert_hashed_no_check` is a good one, though I think it still requires that consistency between `create_hash` and `ScalarValue::hash` for the case of collisions, right? Yes, that is my understanding. The method is probably really a bit dangerous since inconsistencies will then only show up when the map gets resized. > Yes that is a cool idea. I wonder if we could use something like that as an initial partial aggregate pass: we would initially aggregate each batch partially as you describe and then update the overall aggregates from the partials. That is also my idea, maybe not per batch, but to aggregate bigger partitions in parallel and then merging the results. The main benefit is not the hashing though, but that updating the accumulators can be done with a generic function instead of dynamic dispatch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
