alamb commented on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888609930


   > More concretely regarding the proposal, how exactly the signature works is 
still a bit unclear to me. If it has a fixed length and is calculated somehow, 
then there is a possibility of collisions.
   
   @jhorstmann  the idea was that the "signature" is just a hash of the values. 
Collisions are handled by the fact that the entry in the hash table is a *list* 
of indicies into the mutable area -- if there are multiple values in that list 
each entry in the mutable area needs to be checked for equality to find the 
correct one. This was not very clear in the writeup and I apologize. 
   
   > To fix the immediate problem of null values, I would try to encode them 
inline into the Vec<u8>, 
   
   I agree this is a good plan (and I think it is similar to what I was trying 
to describe in the alternative section). I plan to try and code this version 
shortly so we have something to compare against.
   
   > That means the GroupByScalar implementation for Eq and Hash are not really 
used, and we could replace that with ScalarValue.
   
   Indeed -- I even have a PR which proposes exactly such a change: 
https://github.com/apache/arrow-datafusion/pull/786 :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to