alamb commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895400332
> And I think in OLAP system, the hash key should always be unique (No rehashing exists). Thanks for the pointer @sundy-li . I agree the the value being hashed is always unique. I do think in very rare cases it is possible for multiple different input values to produce the same output hash value -- aka collisions do happen. > The first one could be partly resolved by updating/introducing a method that takes &ScalarValue values instead of an owned one which requires some more clones. That will require a lot of updating though probably, around the accumulators, etc. > And it can get away when the actual array contents could be stored in an array directly from the start, which I think is the more longer term plan. @Dandandan yes, I think building up the actual array content (rather than using `ScalarValue` would be the best approach here). I tried to rework the output creation code to avoid copying `ScalarValue` (by taking ownership) but I couldn't figure out how to do it as the table is stored row wise (one row with multiple group columns) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
