Hi Apurv,
1. I seem to understand the Pi example, here is what I have understood,
> please correct me if I am wrong.
> Each of the BSPPeers does the local computation of Pi and sends it to a
> special BSPPeer which we have chosen as the master node. The choice of
> master node is completely arbitrary. It is from this node that we later
> fetch the results.
Yes, it is used by mapreduce as well, see the piestimator in Hadoop.
The design of a master is quite bad in my opinion, but many algorithms have
to use one to keep a globally synced state.
2. I read that a BSP Task is composed of a series of supersteps, when we
> write sync() {which flushes all messages to the input queues of the
> intended BSPPeers , does this correspond to a completion of one superstep
> in the whole computation? Most computations have a sync() as the last
> line
> in the bsp function.
When you call sync() a superstep ends. When the method returns, you're in a
new superstep.
3. Just as in hadoop each Map/Reduce Task gets an input split, does the
> bsp task also gets an input split. If yes, can we use the readNext()
> method
> in BSPPeer interface to obtain the data from files.
Yes. You are also allowed to read the input from beginning again when you
call reOpenInput().
4. How is a matrix going to be represented in the file? Are there any
> papers that describe matrix algorithms on the BSP framework.
Interesting theme.
A dense matrix can be represented as a two dimensional array, a sparse
matrix could be a hashmap of an row or column id mapped to a vector (which
can be sparse or dense as well).
In a sequencefile I would write LongWritable as a rowid as the key and a
Vector implementation as the value.
But that is just a naive approach, there are better ones. At least it is
depending on what algorithm you want to code.
I'm not a paper guy, so maybe others can link you to other cool papers
about this ;)
Greetings,
Thomas
2012/1/15 Apurv Verma <[email protected]>
> Hii all,
> I seem to have overcome my initial hadoop setting up problems. I have some
> questions.
>
>
> 1. I seem to understand the Pi example, here is what I have understood,
> please correct me if I am wrong.
> Each of the BSPPeers does the local computation of Pi and sends it to a
> special BSPPeer which we have chosen as the master node. The choice of
> master node is completely arbitrary. It is from this node that we later
> fetch the results.
>
> 2. I read that a BSP Task is composed of a series of supersteps, when we
> write sync() {which flushes all messages to the input queues of the
> intended BSPPeers , does this correspond to a completion of one superstep
> in the whole computation? Most computations have a sync() as the last
> line
> in the bsp function.
>
> 3. Just as in hadoop each Map/Reduce Task gets an input split, does the
> bsp task also gets an input split. If yes, can we use the readNext()
> method
> in BSPPeer interface to obtain the data from files.
>
> 4. How is a matrix going to be represented in the file? Are there any
> papers that describe matrix algorithms on the BSP framework.
>
>
> Thank you all for the previous help and support !!
>
> --
> thanks and regards,
>
> Apurv Verma
> B. Tech.(CSE)
> IIT- Ropar
>
--
Thomas Jungblut
Berlin <[email protected]>