Alberto,

On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti
<[email protected]> wrote:
> I don't know if speculatives maps are on, I'll check it. One thing I
> observed is that reduces begin before all maps have finished. Let me check
> also if the difference is on the map side or in the reduce. I believe it's
> balanced, both are slower when adding more nodes, but i'll confirm that.

Maps and reduces are speculative by default, so must've been ON. Could
you also post a general input vs. output record counts and statistics
like that between your job runs, to correlate?

The reducers get scheduled early but do not exactly "reduce()" until
all maps are done. They just keep fetching outputs. Their scheduling
can be controlled with some configurations (say, to start only after
X% of maps are done -- by default it starts up when 5% of maps are
done).

-- 
Harsh J

Reply via email to