http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/
<http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/>discusses the theory in detail. On Sun, Oct 24, 2010 at 7:30 AM, Shi Yu <[email protected]> wrote: > There is a suggestion to set the number of reducers to a prime number > closest to the number of nodes and number of mappers a prime number closest > to several times the number of nodes in the cluster. But there is also > saying that "There is no need for the number of reduces to be prime. The > only thing it helps is if you are using the HashPartitioner and your key's > hash function is too linear. In practice, you usually want to use 99% of > your reduce capacity of the cluster." > > Could anyone explain what is the theory behind the prime number and the > hash function here? > > Shi > >
