You may also see Knuth's "The Art of Computer Programming". I remember that there is a discussion about prime number and hash function. (It should be in Volume 3 Chapter 6. There is a section about hashing. Sorry that I don't have the book with me and can't give you the page numbers.)
Nicholas ________________________________ From: aniket ray <[email protected]> To: [email protected] Sent: Mon, October 25, 2010 12:12:16 PM Subject: Re: Prime number of reduces vs. linear hash function http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/ <http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/>discusses the theory in detail. On Sun, Oct 24, 2010 at 7:30 AM, Shi Yu <[email protected]> wrote: > There is a suggestion to set the number of reducers to a prime number > closest to the number of nodes and number of mappers a prime number closest > to several times the number of nodes in the cluster. But there is also > saying that "There is no need for the number of reduces to be prime. The > only thing it helps is if you are using the HashPartitioner and your key's > hash function is too linear. In practice, you usually want to use 99% of > your reduce capacity of the cluster." > > Could anyone explain what is the theory behind the prime number and the > hash function here? > > Shi > >
