alamb commented on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895400332


   > And I think in OLAP system, the hash key should always be unique (No 
rehashing exists).
   
   Thanks for the pointer @sundy-li . I agree the the value being hashed is 
always unique. I do think in very rare cases it is possible for multiple 
different input values to produce the same output hash value -- aka collisions 
do happen. 
   
   > The first one could be partly resolved by updating/introducing a method 
that takes &ScalarValue values instead of an owned one which requires some more 
clones. That will require a lot of updating though probably, around the 
accumulators, etc.
   
   > And it can get away when the actual array contents could be stored in an 
array directly from the start, which I think is the more longer term plan.
   
   @Dandandan  yes, I think building up the actual array content (rather than 
using `ScalarValue` would be the best approach here). I tried to rework the 
output creation code to avoid copying `ScalarValue` (by taking ownership) but I 
couldn't figure out how to do it as the table is stored row wise (one row with 
multiple group columns)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to