Do it mean for each iteration the computation (code within bsp function) 
requires to read the same or different input? 

I have this questions is because it seems to me having related to what 
previously I mentioned regarding to the rework of bsp function (providing a 
smaller computation unit e.g. superstep). 

bsp(...) {
sync()
// superstep 1
// read from hdfs
// compute1()
// send messages ...
sync()
// superstep 2
// read from/ write pvfs
// compute2()
sync()
// superstep 3
// write to cassandra
// compute3()
sync()
...
}

The reason is because within bsp() it consists of several supersteps. And for 
each iteration, users probably want to read from/ write to different input/ 
output. This is a pattern. Although current bsp() is flexible allowing users to 
write whatever they want within bsp(), the disadvantage I observe include 1.) 
difficult for recovery 2.) many code mixed up together within one function.  

The first one may be overcome by source code instrumentation but that is not a 
good solution because users do not know what/ where goes wrong when bsp() 
doesn't function well. 

The second one is a bit minor, and can be e.g. reorganized in a more modular 
way. But this looks similar to the way if we provide e.g superstep(). 

Just some thoughts.  

-----Original message-----
From:Thomas Jungblut <[email protected]>
To:[email protected]
Date:Tue, 29 Nov 2011 04:39:38 +0100
Subject:Reset Input RecordReader

Hi all,

I need some kind of reset-logic for the input of a BSP Job.
It should be quite easy to add:
- add a method called resetInput() in BSPPeer
- in concrete implementation it just closes the input split and opens it
again

If you're interested why I need this, I'm currently writing a k-means
clustering in BSP.
I need to iterate over all vectors from the input and measure distance
against a set of centers in each superstep, so it would help me to "reset"
the input.

Do you think I can add this right away into the trunk?

-- 
Thomas Jungblut
Berlin <[email protected]>


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Reply via email to