Hello, On Monday 20 August 2007 13:15, Ian Lynagh wrote: > ... > I'm also suspicious of this, though: > > -- | A sample hash function for Strings. We keep multiplying by the > -- golden ratio and adding. > -- > -- Note that this has not been extensively tested for reasonability, > -- but Knuth argues that repeated multiplication by the golden ratio > -- will minimize gaps in the hash space. > hashString :: String -> Int32 > hashString = foldl' f 0 > where f m c = fromIntegral (ord c + 1) * golden + mulHi m golden > > should this be > > where f m c = (fromIntegral (ord c + 1) + m) * golden > > ? Does Knuth (TAOCP?) say?
In the 2nd edition of Knuth's The Art of Computer Programming, Vol 3, Sorting and Searching there is a discussion of hash functions on pp. 514-520. One of the techniques suggested for hashing a one-word (i.e. essentially fixed-size) key is the following multiplicative scheme: h(K) = floor ( M*(((A/w)*K)) mod 1) ) where w is the word-size (say, 2^32), M is the desired limit of the hash function (for efficiency, probably a suitable power of 2) and, finally, A is some integer constant. What happens here is that we consider the (word) K as a fraction with the binary point at the left end of the word rather than at the right, thus getting a fraction with a value between 0 and 1. This value we then multiply by A and cut off the integer part, once again getting a fractional value between 0 and 1. And finally, we multiply by M and cut away the fractional part to get an integer value between 0 and M-1. And, sure, Knuth suggests various variants of selecting the multiplier A related to the golden ratio (sqrt(5)-1)/2 = 0.6180... to gain suitable spreading of hashes for keys in arithmetic progressions. (K, K+d, K+2d, ...). But what we are dealing with in the hashString function is what Knuth would call a multiword or variable-length key. Such cases, Knuth suggests, "can be handled by multiple-precision extensions of [e.g. the multiplicative scheme] above, but it is generally adequate to speed things up by combining the individual words together into a single word, then doing a single multiplication ... as above." Neither of the above definitions of f implement a multiple-precision extension of the multiplicative hashing scheme that involves the golden ration. And none of the methods suggested by Knuth for combining multiple words into single words or otherwise compute hashes for multiword keys involve the golden ration. So I cannot find obvious traces of Knuth having anything at all to say about either of the f's. > ... Best regards Thorkil _______________________________________________ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs