Hi Vinod,

Thanks a lot for the info. That's good to know!

Yeah our model is based on 0.20. Could you possibly give us a pointer to
the main changes related to the map/shuffle/reduce phases since 0.20? We'll
be excited to extend our cost-based optimization for YARN.

Jie

On Tue, Jan 24, 2012 at 12:02 AM, Vinod Kumar Vavilapalli <
[email protected]> wrote:

> On Mon, Jan 23, 2012 at 5:11 PM, Jie Li <[email protected]> wrote:
> > What we are looking for, is more of the difference at the task level.
> > Suppose a map task takes 10 minutes in Hadoop, then we have a model to
> > analyse what makes up the 10 minutes, e.g. reading from HDFS, invoking
> the
> > map function, writing to the buffer, partitioning, sorting and merging.
> > This model can be used to identify the bottleneck of the task execution
> and
> > suggest better configurations.
>
>
> The task run time hasn't changed from 0.21/0.22. But it has changed if
> you compare with 0.20, the new runtime has a lot of performance
> improvements and is expected to be better with all the optimizations.
> To answer your question, yes your 'model' shouldn't need any changes.
>
>
> > If we run MR jobs in YARN, can we use the same model to analyse the
> running
> > time of a task? One possible difference I've noticed so far is that the
> > shuffling has become a service of the node manager. Any other change
> > related to the map phase or reduce phase?
>
> Shuffle used to be part of the TaskTracker, it is now in the NM.
> Except that, there isn't much difference that should affect you.
>
> HTH,
> +Vinod
>
>

Reply via email to