alamb commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888609930
> More concretely regarding the proposal, how exactly the signature works is still a bit unclear to me. If it has a fixed length and is calculated somehow, then there is a possibility of collisions. @jhorstmann the idea was that the "signature" is just a hash of the values. Collisions are handled by the fact that the entry in the hash table is a *list* of indicies into the mutable area -- if there are multiple values in that list each entry in the mutable area needs to be checked for equality to find the correct one. This was not very clear in the writeup and I apologize. > To fix the immediate problem of null values, I would try to encode them inline into the Vec<u8>, I agree this is a good plan (and I think it is similar to what I was trying to describe in the alternative section). I plan to try and code this version shortly so we have something to compare against. > That means the GroupByScalar implementation for Eq and Hash are not really used, and we could replace that with ScalarValue. Indeed -- I even have a PR which proposes exactly such a change: https://github.com/apache/arrow-datafusion/pull/786 :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
