I will try to provide a patch so that we can have a baseline for discussion. 

-----Original message-----
From:Edward J. Yoon <[email protected]>
To:[email protected],[email protected]
Date:Tue, 20 Sep 2011 15:49:20 +0900
Subject:Re: [Discussion] Refactor bsp() for recovery procedure

> The disadvantage may be the client programme need to explicitly tell the 
> order of superstep.

If user want to call a sync() method repeatedly in the loops while or
until a condition is true, how to program it?

bsp() {

  while (condition is true) {
    doLocalComputation();
    communicationWith(others);
    sync();
  }

}

I think, current BSP programming interface is very good. If it's just
only for recovery, we have to find another way.

2011/9/19 ChiaHung Lin <[email protected]>:
> Currently we have bsp() where users can code for performing thier tasks. For 
> instance,
>
> ... bsp() ...{
>   ... // some computation
>   sync();
>   ... // some other computation
>   sync();
>   ...
> }
>
> However, this is difficult for recovery because 1st, it requires checkpointed 
> messages to be recovered so that the computation can be resumed from where it 
> fails; 2nd, the recovery procedure needs to know from which super step to 
> restart. With the current bsp(), it seems a common choice is preprocessing; 
> but this may not be good because when internally something goes wrong it, it 
> is not easy to find out the problem.
>
> I come up with an alternative method but this would have change to the way of 
> our current procedure. So I think it would be good to discuss it first. It is 
> proposed as below:
>
> 1. we divide bsp() into smaller computation unit called e.g. step() or 
> superstep(), within which user still write their own logic.
>
> 2. in main, user composes the order of supersteps.
>
> ... class Superstep1 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
> ... class Superstep2 extends BSPSuperstep {
>   ... superstep() ... {...}
> }
>
> BSPJob bsp = new BSP(...);
> bsp.compose(Superstep1.class).compose(Superstep2.class)...;
>
> Therefore, when recovery, in BSPTask run() we can have
>
> List<BSPSuperstep> steps = BSPJob.supersteps();
>
> for(BSPSuperstep step: steps) {
>   if(checkpointed) {
>     // restore checkpointed messages e.g. adding checkpointed msg (in hdfs) 
> back to queues
>   }
>   step.superstep(...);
>   step.sync();
> }
>
> The advantage is easier for recovery procedure.
> The disadvantage may be the client programme need to explicitly tell the 
> order of superstep.
>
> Any thought?
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Reply via email to