Eric Baldeschwieler wrote:
Of course interleaving the sort with the copy phase would also be
desirable...
But I'm all for clearly IDing reduces vs shuffle.
I think this is mostly a terminology problem.
There is a 1:1 correspondence between shuffle tasks and reduce tasks,
and a strict ordered dependency. There's no advantage in trying to
separate their implementations: we need to start a thread to manage
first a shuffle and then, immediately after, if the shuffle suceeds, a
reduce. So this may as well be the same thread.
So I don't think we need a ShuffleTask class, separately scheduled by
the TaskTracker, but, rather, we just need to start calling the first
part of the reduce task progress "shuffle". Thus the fix is only to
progress reporting code.
Doug