Hi Vinod, Thanks a lot for the info. That's good to know!
Yeah our model is based on 0.20. Could you possibly give us a pointer to the main changes related to the map/shuffle/reduce phases since 0.20? We'll be excited to extend our cost-based optimization for YARN. Jie On Tue, Jan 24, 2012 at 12:02 AM, Vinod Kumar Vavilapalli < [email protected]> wrote: > On Mon, Jan 23, 2012 at 5:11 PM, Jie Li <[email protected]> wrote: > > What we are looking for, is more of the difference at the task level. > > Suppose a map task takes 10 minutes in Hadoop, then we have a model to > > analyse what makes up the 10 minutes, e.g. reading from HDFS, invoking > the > > map function, writing to the buffer, partitioning, sorting and merging. > > This model can be used to identify the bottleneck of the task execution > and > > suggest better configurations. > > > The task run time hasn't changed from 0.21/0.22. But it has changed if > you compare with 0.20, the new runtime has a lot of performance > improvements and is expected to be better with all the optimizations. > To answer your question, yes your 'model' shouldn't need any changes. > > > > If we run MR jobs in YARN, can we use the same model to analyse the > running > > time of a task? One possible difference I've noticed so far is that the > > shuffling has become a service of the node manager. Any other change > > related to the map phase or reduce phase? > > Shuffle used to be part of the TaskTracker, it is now in the NM. > Except that, there isn't much difference that should affect you. > > HTH, > +Vinod > >
