Jeremy, A clarification: there is currently no mechanism in Hadoop to slot particular tasks on particular nodes. Hadoop does not take into account a particular node's suitability for a given task; if one node has more CPU, and another node has more IO, you cannot indicate that certain tasks should be done on the CPU-intense nodes, and others on the IO-intense nodes.
Speculative execution, though, means that any tasks which are "left behind" near the end of a job will be re-executed in parallel on multiple other "empty" nodes which are waiting for the full job to complete. Hopefully, it'll also pick a "correct" node for the task via this secondary random placement, if it didn't do it in the first apportioning of jobs. By default, I think map task speculation is enabled, but reduce task speculation is disabled. - Aaron On Wed, Dec 24, 2008 at 1:12 AM, Devaraj Das <[email protected]> wrote: > You can enable speculative execution for your jobs. > > > On 12/24/08 10:25 AM, "Jeremy Chow" <[email protected]> wrote: > > > Hi list, > > I've come up against a scenario like this, to finish a same task, one of > my > > hadoop cluster only needs 5 seconds, and another one needs more than 2 > > minutes. > > It's a common phenomenon that will decrease the parallelism of our system > > due to the faster one will wait the slower one. How to coordinate those > > nodes of different computing powers in a same cluster? > > > > Thanks, > > Jeremy > > >
