Hi,

I've written several MapReduce jobs. However, I
noticed the jobs took a long time to finish
due to sort and reduce. I run tasks with
1 reducer. Is there a guideline on how
many reduce task? If I'm running job on
4 boxes, does it means I should specify
4 reduce tasks max? Would the results be
different if the number of reduce tasks
are different?

In google's implementation, they have a way
to specify partition function. Does hadoop
have similar feature?


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to