Re: Hama Doubts

Thomas Jungblut Sun, 15 Jan 2012 04:47:12 -0800

Hi Apurv,

  1. I seem to understand the Pi example, here is what I have understood,
>   please correct me if I am wrong.
>   Each of the BSPPeers does the local computation of Pi and sends it to a
>   special BSPPeer which we have chosen as the master node. The choice of
>   master node is completely arbitrary. It is from this node that we later
>   fetch the results.



Yes, it is used by mapreduce as well, see the piestimator in Hadoop.
The design of a master is quite bad in my opinion, but many algorithms have
to use one to keep a globally synced state.

  2. I read that a BSP Task is composed of a series of supersteps, when we
>   write sync() {which flushes all messages to the input queues of the
>   intended BSPPeers , does this correspond to a completion of one superstep
>   in the whole computation? Most computations have a sync() as the last
> line
>   in the bsp function.


When you call sync() a superstep ends. When the method returns, you're in a
new superstep.

  3. Just as in hadoop each Map/Reduce Task gets an input split, does the
>   bsp task also gets an input split. If yes, can we use the readNext()
> method
>   in BSPPeer interface to obtain the data from files.


Yes. You are also allowed to read the input from beginning again when you
call reOpenInput().

  4. How is a matrix going to be represented in the file? Are there any
>   papers that describe matrix algorithms on the BSP framework.


Interesting theme.
A dense matrix can be represented as a two dimensional array, a sparse
matrix could be a hashmap of an row or column id mapped to a vector (which
can be sparse or dense as well).

In a sequencefile I would write LongWritable as a rowid as the key and a
Vector implementation as the value.
But that is just a naive approach, there are better ones. At least it is
depending on what algorithm you want to code.

I'm not a paper guy, so maybe others can link you to other cool papers
about this ;)

Greetings,
Thomas

2012/1/15 Apurv Verma <[email protected]>

> Hii all,
>  I seem to have overcome my initial hadoop setting up problems. I have some
> questions.
>
>
>   1. I seem to understand the Pi example, here is what I have understood,
>   please correct me if I am wrong.
>   Each of the BSPPeers does the local computation of Pi and sends it to a
>   special BSPPeer which we have chosen as the master node. The choice of
>   master node is completely arbitrary. It is from this node that we later
>   fetch the results.
>
>   2. I read that a BSP Task is composed of a series of supersteps, when we
>   write sync() {which flushes all messages to the input queues of the
>   intended BSPPeers , does this correspond to a completion of one superstep
>   in the whole computation? Most computations have a sync() as the last
> line
>   in the bsp function.
>
>   3. Just as in hadoop each Map/Reduce Task gets an input split, does the
>   bsp task also gets an input split. If yes, can we use the readNext()
> method
>   in BSPPeer interface to obtain the data from files.
>
>   4. How is a matrix going to be represented in the file? Are there any
>   papers that describe matrix algorithms on the BSP framework.
>
>
> Thank you all for the previous help and support !!
>
> --
> thanks and regards,
>
> Apurv Verma
> B. Tech.(CSE)
> IIT- Ropar
>



-- 
Thomas Jungblut
Berlin <[email protected]>

Re: Hama Doubts

Reply via email to