There is a suggestion to set the number of reducers to a prime number closest to the number of nodes and number of mappers a prime number closest to several times the number of nodes in the cluster. But there is also saying that "There is no need for the number of reduces to be prime. The only thing it helps is if you are using the HashPartitioner and your key's hash function is too linear. In practice, you usually want to use 99% of your reduce capacity of the cluster."

Could anyone explain what is the theory behind the prime number and the hash function here?

Shi

Reply via email to