Ross Boucher wrote:
My cluster has 4 machines on it, so based on the recommendations on the wiki, I set my reduce count to 8. Unfortunately, the performance was less than ideal. Specifically, when the map functions had finished, I had to wait an additional 40% of the total job time just for copying/sorting the files. I know for a fact that the sort is very fast, so the only remaining question is why moving the files around takes so long.
How much data was there to copy? How long was the total job time? If there are only small amounts of data, and the total job time is short, then copy scheduling overhead might be significant.
Doug
