I will try to provide a patch so that we can have a baseline for discussion.
-----Original message----- From:Edward J. Yoon <[email protected]> To:[email protected],[email protected] Date:Tue, 20 Sep 2011 15:49:20 +0900 Subject:Re: [Discussion] Refactor bsp() for recovery procedure > The disadvantage may be the client programme need to explicitly tell the > order of superstep. If user want to call a sync() method repeatedly in the loops while or until a condition is true, how to program it? bsp() { while (condition is true) { doLocalComputation(); communicationWith(others); sync(); } } I think, current BSP programming interface is very good. If it's just only for recovery, we have to find another way. 2011/9/19 ChiaHung Lin <[email protected]>: > Currently we have bsp() where users can code for performing thier tasks. For > instance, > > ... bsp() ...{ > ... // some computation > sync(); > ... // some other computation > sync(); > ... > } > > However, this is difficult for recovery because 1st, it requires checkpointed > messages to be recovered so that the computation can be resumed from where it > fails; 2nd, the recovery procedure needs to know from which super step to > restart. With the current bsp(), it seems a common choice is preprocessing; > but this may not be good because when internally something goes wrong it, it > is not easy to find out the problem. > > I come up with an alternative method but this would have change to the way of > our current procedure. So I think it would be good to discuss it first. It is > proposed as below: > > 1. we divide bsp() into smaller computation unit called e.g. step() or > superstep(), within which user still write their own logic. > > 2. in main, user composes the order of supersteps. > > ... class Superstep1 extends BSPSuperstep { > ... superstep() ... {...} > } > ... class Superstep2 extends BSPSuperstep { > ... superstep() ... {...} > } > > BSPJob bsp = new BSP(...); > bsp.compose(Superstep1.class).compose(Superstep2.class)...; > > Therefore, when recovery, in BSPTask run() we can have > > List<BSPSuperstep> steps = BSPJob.supersteps(); > > for(BSPSuperstep step: steps) { > if(checkpointed) { > // restore checkpointed messages e.g. adding checkpointed msg (in hdfs) > back to queues > } > step.superstep(...); > step.sync(); > } > > The advantage is easier for recovery procedure. > The disadvantage may be the client programme need to explicitly tell the > order of superstep. > > Any thought? > > -- > ChiaHung Lin > Department of Information Management > National University of Kaohsiung > Taiwan > -- Best Regards, Edward J. Yoon @eddieyoon -- ChiaHung Lin Department of Information Management National University of Kaohsiung Taiwan
