Hey Edward, from what you tell you are speaking about some kind of speculative task execution. This is also possible in YARN, if you have free resources.
Overall, the running time of a task is highly dependend on how much work it has to do. So splitting the data even will result in good performance in every task. Also we can reduce the time we spend in synchronization by using asynchronous messaging. In this case we send messages while computation phase and just transferring the missing rest which has not yet been transfered in the sync phase. This should result in a performance boost, especially in messaging heavy jobs like SSSP. 2011/10/25 Edward J. Yoon <[email protected]> > Hi, > > I heard that the task scheduling will be most important factor for > high performance on large cluster since the barrier waits for the > slowest task. What do you think about this? > > P.S., If user use YARN cluster, BSP task scheduling will be done by > their resource management system. > > -- > Best Regards, Edward J. Yoon > @eddieyoon > -- Thomas Jungblut Berlin <[email protected]>
