westonpace commented on PR #13487:
URL: https://github.com/apache/arrow/pull/13487#issuecomment-1171714784

   > I'm not sure how to validate the hash outputs are "as expected". Further, 
unit tests for the hashing functions don't seem to validate hash outputs.
   
   For a hashing function I would expect:
    * If two values are equal then their hashes are equal
    * Given a random selection of non-equal values there should be some kind of 
expected false positive rate (e.g. equal hashes on unequal values).  Ideally we 
would include, as part of this, a benchmark that measures the FPR on random 
values.  You could then take then, pick a safe threshold (e.g. if the benchmark 
tends to show a 5% FPR then pick 10%) and put that into the unit test (e.g. 
assert the FPR is less than the safe threshold).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to