doug,

it doesnt matter how the code is structured, what does matter is that
the reduce phase and shuffle phase have very different timelines and
resource requirements and should not both be charged the the number of
reduce tasks permitted.

it should be possible to have lots of tasks in the shuffle phase
(mostly, sitting around waiting for mappers to run), but only have
"about" one actual reduce phase running per cpu (or whatever works for
each of our apps) that gets enough memory for a sorter, does
substantial computation, etc.

maybe thats what you meant, and if so apologies, just wanted to be clear.

i'm sure that can be done with a single task/thread that does both
phases, and thats probably the simplest way to code it.

paul

On 7/24/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Eric Baldeschwieler wrote:
> Of course interleaving the sort with the copy phase would also be
> desirable...
>
> But I'm all for clearly IDing reduces vs shuffle.

I think this is mostly a terminology problem.

There is a 1:1 correspondence between shuffle tasks and reduce tasks,
and a strict ordered dependency.  There's no advantage in trying to
separate their implementations: we need to start a thread to manage
first a shuffle and then, immediately after, if the shuffle suceeds, a
reduce.  So this may as well be the same thread.

So I don't think we need a ShuffleTask class, separately scheduled by
the TaskTracker, but, rather, we just need to start calling the first
part of the reduce task progress "shuffle".  Thus the fix is only to
progress reporting code.

Doug

Reply via email to