Yep, it is just a reopen. Let's call it like this. I'm going to make up a patch later. Therefore it is just the read of the same assigned split. So no problem ;)
Yes BSP is not atomic, but as long as the user sticks with the communication and the stuff from IO (not using fields in a hashmap like pagerank or so) this is always easy recoverable. But you cannot express every algorithm with just one sync at the end of a function, so BSP() must be somewhere anyways. For me it is a question of algorithm design, as long as you use major parts from our framework, this is fail safe. 2011/11/29 ChiaHung Lin <[email protected]> > Do it mean for each iteration the computation (code within bsp function) > requires to read the same or different input? > > I have this questions is because it seems to me having related to what > previously I mentioned regarding to the rework of bsp function (providing a > smaller computation unit e.g. superstep). > > bsp(...) { > sync() > // superstep 1 > // read from hdfs > // compute1() > // send messages ... > sync() > // superstep 2 > // read from/ write pvfs > // compute2() > sync() > // superstep 3 > // write to cassandra > // compute3() > sync() > ... > } > > The reason is because within bsp() it consists of several supersteps. And > for each iteration, users probably want to read from/ write to different > input/ output. This is a pattern. Although current bsp() is flexible > allowing users to write whatever they want within bsp(), the disadvantage I > observe include 1.) difficult for recovery 2.) many code mixed up together > within one function. > > The first one may be overcome by source code instrumentation but that is > not a good solution because users do not know what/ where goes wrong when > bsp() doesn't function well. > > The second one is a bit minor, and can be e.g. reorganized in a more > modular way. But this looks similar to the way if we provide e.g > superstep(). > > Just some thoughts. > > -----Original message----- > From:Thomas Jungblut <[email protected]> > To:[email protected] > Date:Tue, 29 Nov 2011 04:39:38 +0100 > Subject:Reset Input RecordReader > > Hi all, > > I need some kind of reset-logic for the input of a BSP Job. > It should be quite easy to add: > - add a method called resetInput() in BSPPeer > - in concrete implementation it just closes the input split and opens it > again > > If you're interested why I need this, I'm currently writing a k-means > clustering in BSP. > I need to iterate over all vectors from the input and measure distance > against a set of centers in each superstep, so it would help me to "reset" > the input. > > Do you think I can add this right away into the trunk? > > -- > Thomas Jungblut > Berlin <[email protected]> > > > -- > ChiaHung Lin > Department of Information Management > National University of Kaohsiung > Taiwan > -- Thomas Jungblut Berlin <[email protected]>
