On Mar 24, 2014, at 2:10 PM, Chandler Carruth <[email protected]> wrote:

> +void MapRegionCounters::combineHash(ASTHash Val) {
> +  // Use Bernstein's hash.
> +  Hash = Hash * 33 + Val;
> +}
> 
> So, you care a lot about collisions, but you use Bernstein's hash? I don't 
> get it. This will collide like crazy. Note that Bernstein's hashing algorithm 
> is only ever a "good" distribution for hashing ascii-ish text, and you're 
> hashing dense things.

This will have fewer collisions than:

   ++Hash; // What we’re effectively doing now.

Bernstein *is* weak.  But we’re not using this for a hash table.  In a hash 
table, you need hashes to be well-distributed, because you need few collisions 
of Hash%Size for arbitrary Size.

On the contrary, this code:

  - compares the full 64-bits,
  - checks separately that the (mangled) function name matches, and
  - checks separately that the number of counters matches.

It seems improbable to me that we’ll get a lot of collisions in this use case.

> If you care about collisions, I would use the high 64 bits of an md5 (or 
> better yet, all 128 bits). Or some other well researched and understood 
> algorithm.

MD5 looks way too slow (!).  Are you suggesting I should import SpookyHash [1] 
or CityHash [2]?  They seem like overkill for this use case, but I’m definitely 
not a hashing expert.

Another idea is to use a simple checksum algorithm, like a variation on 
Fletcher’s [3].  Any thoughts on that?

[1]: http://burtleburtle.net/bob/hash/spooky.html
[2]: https://code.google.com/p/cityhash/
[3]: http://en.wikipedia.org/wiki/Fletcher%27s_checksum
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to