Slightly disagree with easy recoverable part. Considering the following code 
snippet

bsp() {
var i,j,k;
compute1()
sync()
compute2()
sync()
  for(...) {
    computex(i, k)
    sync()
    computey(j)
    sync()
  }// for
}

Suppose it has 43 supersteps. And it has checkpointed data at the 23th 
superstep, then bsp task crashes. So steps to recover may include 1.) analyze 
source to ensure the number of sync() reaching to the superstep 23th. 2.) main 
thread need to find a way going to that function and feeding the checkpoint 
data and maybe also ensure it does not violate some atomicity with variables 
some where else. 

The reason why I think it might be easier for recovery with a bit fine grained 
unit is because we can achieve by feeding checkpointed messages back to a 
superstep directly (as below). (Of course this is not the only way, we can 
discuss and probably find out a better solution)

// in framework
Superstep step ...;
if(recovered) {
  step = supersteps.get(22)
  step.recover(checkpointedData)
}
...


superstep() {
  if(recovered) {
    ... getCheckpointedMessage()
    // do something
  }
}

For sync(), it is not necessary to separate sync() from superstep, so we can 
have functions allowing users to specify e.g. syncBefore(), syncAfter(), etc. 
when a superstep is called.


-----Original message-----
From:Thomas Jungblut <[email protected]>
To:[email protected]
Date:Tue, 29 Nov 2011 07:24:45 +0100
Subject:Re: Reset Input RecordReader

Yep, it is just a reopen. Let's call it like this. I'm going to make up a
patch later.
Therefore it is just the read of the same assigned split. So no problem ;)

Yes BSP is not atomic, but as long as the user sticks with the
communication and the stuff from IO (not using fields in a hashmap like
pagerank or so) this is always easy recoverable.
But you cannot express every algorithm with just one sync at the end of a
function, so BSP() must be somewhere anyways.
For me it is a question of algorithm design, as long as you use major parts
from our framework, this is fail safe.


2011/11/29 ChiaHung Lin <[email protected]>

> Do it mean for each iteration the computation (code within bsp function)
> requires to read the same or different input?
>
> I have this questions is because it seems to me having related to what
> previously I mentioned regarding to the rework of bsp function (providing a
> smaller computation unit e.g. superstep).
>
> bsp(...) {
> sync()
> // superstep 1
> // read from hdfs
> // compute1()
> // send messages ...
> sync()
> // superstep 2
> // read from/ write pvfs
> // compute2()
> sync()
> // superstep 3
> // write to cassandra
> // compute3()
> sync()
> ...
> }
>
> The reason is because within bsp() it consists of several supersteps. And
> for each iteration, users probably want to read from/ write to different
> input/ output. This is a pattern. Although current bsp() is flexible
> allowing users to write whatever they want within bsp(), the disadvantage I
> observe include 1.) difficult for recovery 2.) many code mixed up together
> within one function.
>
> The first one may be overcome by source code instrumentation but that is
> not a good solution because users do not know what/ where goes wrong when
> bsp() doesn't function well.
>
> The second one is a bit minor, and can be e.g. reorganized in a more
> modular way. But this looks similar to the way if we provide e.g
> superstep().
>
> Just some thoughts.
>
> -----Original message-----
> From:Thomas Jungblut <[email protected]>
> To:[email protected]
> Date:Tue, 29 Nov 2011 04:39:38 +0100
> Subject:Reset Input RecordReader
>
> Hi all,
>
> I need some kind of reset-logic for the input of a BSP Job.
> It should be quite easy to add:
> - add a method called resetInput() in BSPPeer
> - in concrete implementation it just closes the input split and opens it
> again
>
> If you're interested why I need this, I'm currently writing a k-means
> clustering in BSP.
> I need to iterate over all vectors from the input and measure distance
> against a set of centers in each superstep, so it would help me to "reset"
> the input.
>
> Do you think I can add this right away into the trunk?
>
> --
> Thomas Jungblut
> Berlin <[email protected]>
>
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Thomas Jungblut
Berlin <[email protected]>


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Reply via email to