Re: performance of multiple map-reduce operations

Doug Cutting Tue, 06 Nov 2007 14:40:44 -0800

Chris Dyer wrote:

For one computation I've been working on lately, over 25% of the time is
spent in the last 10% of each map/reduce operation (this has to do with the
natural distribution of my input data and would be unavoidable even given an
optimal partitioning).  During this time, I have dozens of nodes sitting
idle that could be executing the map part of the next job, if only the
framework knew that is was coming.  Has anyone dealt with this or found a
good workaround?

If your next job depends on the output of the prior job, then you needto wait for the prior to complete. But if your next job is independent,you can submit it right away, and its map tasks will run as the reducetasks are running for the prior job.


Doug

Re: performance of multiple map-reduce operations

Reply via email to