> You are aware of course that you can't use any hashing function on its own to
> detect duplicates? - the best you can do is detect *probable* duplicates,

Actually, if you choose the right hash function you can detect duplicates.

If you create a UDF based on/using SHA256, the result would be unique  (with a 
2^256 certainty) -- there is no known collision of a SHA256 hash 
(https://en.wikipedia.org/wiki/Hash_function_security_summary).


Sean

Reply via email to