Oh yes, there was the little problem ;) Thanks for reminding. Your "fix" would be to let the user implement several chained computation methods?
2011/11/29 ChiaHung Lin <[email protected]> > Slightly disagree with easy recoverable part. Considering the following > code snippet > > bsp() { > var i,j,k; > compute1() > sync() > compute2() > sync() > for(...) { > computex(i, k) > sync() > computey(j) > sync() > }// for > } > > Suppose it has 43 supersteps. And it has checkpointed data at the 23th > superstep, then bsp task crashes. So steps to recover may include 1.) > analyze source to ensure the number of sync() reaching to the superstep > 23th. 2.) main thread need to find a way going to that function and feeding > the checkpoint data and maybe also ensure it does not violate some > atomicity with variables some where else. > > The reason why I think it might be easier for recovery with a bit fine > grained unit is because we can achieve by feeding checkpointed messages > back to a superstep directly (as below). (Of course this is not the only > way, we can discuss and probably find out a better solution) > > // in framework > Superstep step ...; > if(recovered) { > step = supersteps.get(22) > step.recover(checkpointedData) > } > ... > > > superstep() { > if(recovered) { > ... getCheckpointedMessage() > // do something > } > } > > For sync(), it is not necessary to separate sync() from superstep, so we > can have functions allowing users to specify e.g. syncBefore(), > syncAfter(), etc. when a superstep is called. > > > -----Original message----- > From:Thomas Jungblut <[email protected]> > To:[email protected] > Date:Tue, 29 Nov 2011 07:24:45 +0100 > Subject:Re: Reset Input RecordReader > > Yep, it is just a reopen. Let's call it like this. I'm going to make up a > patch later. > Therefore it is just the read of the same assigned split. So no problem ;) > > Yes BSP is not atomic, but as long as the user sticks with the > communication and the stuff from IO (not using fields in a hashmap like > pagerank or so) this is always easy recoverable. > But you cannot express every algorithm with just one sync at the end of a > function, so BSP() must be somewhere anyways. > For me it is a question of algorithm design, as long as you use major parts > from our framework, this is fail safe. > > > 2011/11/29 ChiaHung Lin <[email protected]> > > > Do it mean for each iteration the computation (code within bsp function) > > requires to read the same or different input? > > > > I have this questions is because it seems to me having related to what > > previously I mentioned regarding to the rework of bsp function > (providing a > > smaller computation unit e.g. superstep). > > > > bsp(...) { > > sync() > > // superstep 1 > > // read from hdfs > > // compute1() > > // send messages ... > > sync() > > // superstep 2 > > // read from/ write pvfs > > // compute2() > > sync() > > // superstep 3 > > // write to cassandra > > // compute3() > > sync() > > ... > > } > > > > The reason is because within bsp() it consists of several supersteps. And > > for each iteration, users probably want to read from/ write to different > > input/ output. This is a pattern. Although current bsp() is flexible > > allowing users to write whatever they want within bsp(), the > disadvantage I > > observe include 1.) difficult for recovery 2.) many code mixed up > together > > within one function. > > > > The first one may be overcome by source code instrumentation but that is > > not a good solution because users do not know what/ where goes wrong when > > bsp() doesn't function well. > > > > The second one is a bit minor, and can be e.g. reorganized in a more > > modular way. But this looks similar to the way if we provide e.g > > superstep(). > > > > Just some thoughts. > > > > -----Original message----- > > From:Thomas Jungblut <[email protected]> > > To:[email protected] > > Date:Tue, 29 Nov 2011 04:39:38 +0100 > > Subject:Reset Input RecordReader > > > > Hi all, > > > > I need some kind of reset-logic for the input of a BSP Job. > > It should be quite easy to add: > > - add a method called resetInput() in BSPPeer > > - in concrete implementation it just closes the input split and opens it > > again > > > > If you're interested why I need this, I'm currently writing a k-means > > clustering in BSP. > > I need to iterate over all vectors from the input and measure distance > > against a set of centers in each superstep, so it would help me to > "reset" > > the input. > > > > Do you think I can add this right away into the trunk? > > > > -- > > Thomas Jungblut > > Berlin <[email protected]> > > > > > > -- > > ChiaHung Lin > > Department of Information Management > > National University of Kaohsiung > > Taiwan > > > > > > -- > Thomas Jungblut > Berlin <[email protected]> > > > -- > ChiaHung Lin > Department of Information Management > National University of Kaohsiung > Taiwan > -- Thomas Jungblut Berlin <[email protected]>
