Hi, I've written several MapReduce jobs. However, I noticed the jobs took a long time to finish due to sort and reduce. I run tasks with 1 reducer. Is there a guideline on how many reduce task? If I'm running job on 4 boxes, does it means I should specify 4 reduce tasks max? Would the results be different if the number of reduce tasks are different?
In google's implementation, they have a way to specify partition function. Does hadoop have similar feature? __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
