On 23-Oct-2001 Steve Meyers wrote: >> > At a previous job, we tested a 32-bit hash function by running it >> > against hundreds of thousands of unique URL's stored in our >> > database. We found one collision. A 64-bit hash is billions of >> > times better (4 billion, to be exact). >> >> Good to know. I wonder how many collisions I'd find if I ran it over >> every URL listed in the directory www.yahoo.com. >> >> Which 64 bit hash function did you use? Invent your own, or something >> "off the shelf"? >> > > We found a public domain one on the net see > http://www.burtleburtle.net/bob/hash/evahash.html for some sample code. > It's only a 32-bit hash though. However, that same page appears to have > instructions for a 64-bit hash function as well, but I haven't tried it at > all. I'd be curious to know how many collisions you find hashing all the > URL's in yahoo's database :) I don't know how long that would take, but if > you do it I'd like to hear the results. > > Since the hash function takes a key and an initial value, you could try > running it with two different initial values and/or keys. This would give > you effectively a 128-bit hash, which you could store across two fields in > MySQL. I'm guessing that the 64-bit hash will probably be good enough > though. >
To store hash URL's i use : CONV(PASSWORD('$url'),16,10) as a bigint unsigned 2+ million (so far) & no collisions. Regards, -- Don Read [EMAIL PROTECTED] -- It's always darkest before the dawn. So if you are going to steal the neighbor's newspaper, that's the time to do it. (53kr33t w0rdz: sql table query) --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php