Generally, this seems fine. Some comments about the patch itself: + // Check that we only have six bits. + assert(unsigned(Type) < 1u << NumBitsPerType && + "Hash is invalid: too many types");
I would love parentheses around the << operator as otherwise my brain struggles with the precedence rules. Also, maybe a complementary static_assert on the enum itself? + static_assert(!FunctionLikeDecl, "FunctionLikeDecl should have value 0"); I would find comparing with 0 more clear than using ! on an enum... On Wed, Mar 26, 2014 at 2:51 PM, Duncan P. N. Exon Smith < [email protected]> wrote: > On Mar 25, 2014, at 5:39 PM, Duncan P. N. Exon Smith <[email protected]> > wrote: > > > On Mar 25, 2014, at 2:30 PM, Chandler Carruth <[email protected]> > wrote: > > > >> > >> On Tue, Mar 25, 2014 at 1:59 PM, Duncan P. N. Exon Smith < > [email protected]> wrote: > >> 2. Grab the upper 64 bits of MD5. > >> > >> How about this. > >> > >> Pack the bits of the enum into a 64-bit integer. When we fill the > integer, spin up an MD5 context, and shove the integer through it, rinse, > repeat. When done, if you have an MD5, take the upper 64-bits. If not, just > take the integer. > >> > >> This should incur no cost for the most common case (only a few CFG > elements) and no collisions roughly ever, and scale nicely due to using > 64-bit chunks. > >> > >> s/MD5/any other stable hashing function really/ -- my hope is that > after doing this, the performance of the case where all the CFG elements > didn't just serialize is relatively unimportant. > >> > >> Then, benchmark it, and if MD5 is a problem, revisit it with faster > and/or lower overhead algorithms which still have well known and fixed > results such as MD4, blake2, etc. > > > > Sounds like a great compromise. I’ll get a new patch posted soon. > > Thanks again for the reviews. Here’s an updated patch. I chose to leave > 6 bits > for the enum to leave space for expansion (so 10 values get packed per > 64-bits). > > > On Mar 25, 2014, at 2:33 PM, Chandler Carruth <[email protected]> > wrote: > > > >> As a co-worker who also works on hashing just reminded me, this relies > on the fact that none of the enum values are 0, otherwise you end up with > 'x' and 'yx' for two sequences which trivially collide. > > Note: there is an enum value that’s zero: FunctionLikeDecl. Each time a > hash is > calculated, the first (and only the first) counter is assigned to the > function > itself (or objective-c block, etc.). The hash logic effectively skips it. > Only > subsequent (and non-zero) values get combined. > > The enum values that do get combined are all non-zero. > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
