Re: [9fans] parallel/distributed computation

erik quanstrom Fri, 26 Oct 2007 22:53:49 -0700

thanks.

- erik


> BSP is essentially all about "the inner loop". In this loop, you do
> the work, and, at the bottom of the loop, you tell everyone what you
> have done.
> 
> So you are either computing or communicating. Which means, on your
> $100M computer, that you are using about $50M of it over time. Which
> is undesirable.
> 
> Nowadays, people work fairly hard to ensure that while computation is
> happening, the network is busy moving data.
> 
> This problem with BSP is well known, which is why some folks have
> tried to time-share the nodes in the following
> way(www.ccs3.lanl.gov/pal/publications/papers/petrini01:feng.pdf):
> have N jobs (N usually 2). While N-1 jobs are using the network, and
> hence not computing, have 1 job computing. Of course, matching this
> all up is hard, and most compute  jobs typically are sized to use all
> of memory, so this approach has not been used much. The nodes on the
> big machines are typically not shared between jobs.
> 
> BSP was an interesting idea but is not commonly used any more, at
> least on the systems I know about. Rather, people work hard to overlap
> communication and computation.
> 
> ron
> p.s. for more recent work see: www.cs.unm.edu/~fastos/06meeting/sft.pdf

Re: [9fans] parallel/distributed computation

Reply via email to