On Mar 24, 2014, at 2:10 PM, Chandler Carruth <[email protected]> wrote:
> +void MapRegionCounters::combineHash(ASTHash Val) {
> + // Use Bernstein's hash.
> + Hash = Hash * 33 + Val;
> +}
>
> So, you care a lot about collisions, but you use Bernstein's hash? I don't
> get it. This will collide like crazy. Note that Bernstein's hashing algorithm
> is only ever a "good" distribution for hashing ascii-ish text, and you're
> hashing dense things.
This will have fewer collisions than:
++Hash; // What we’re effectively doing now.
Bernstein *is* weak. But we’re not using this for a hash table. In a hash
table, you need hashes to be well-distributed, because you need few collisions
of Hash%Size for arbitrary Size.
On the contrary, this code:
- compares the full 64-bits,
- checks separately that the (mangled) function name matches, and
- checks separately that the number of counters matches.
It seems improbable to me that we’ll get a lot of collisions in this use case.
> If you care about collisions, I would use the high 64 bits of an md5 (or
> better yet, all 128 bits). Or some other well researched and understood
> algorithm.
MD5 looks way too slow (!). Are you suggesting I should import SpookyHash [1]
or CityHash [2]? They seem like overkill for this use case, but I’m definitely
not a hashing expert.
Another idea is to use a simple checksum algorithm, like a variation on
Fletcher’s [3]. Any thoughts on that?
[1]: http://burtleburtle.net/bob/hash/spooky.html
[2]: https://code.google.com/p/cityhash/
[3]: http://en.wikipedia.org/wiki/Fletcher%27s_checksum
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits