On 10/26/07, erik quanstrom <[EMAIL PROTECTED]> wrote: > could you elaborate or give a pointer explaining why > bsp is insufficient?
BSP is essentially all about "the inner loop". In this loop, you do the work, and, at the bottom of the loop, you tell everyone what you have done. So you are either computing or communicating. Which means, on your $100M computer, that you are using about $50M of it over time. Which is undesirable. Nowadays, people work fairly hard to ensure that while computation is happening, the network is busy moving data. This problem with BSP is well known, which is why some folks have tried to time-share the nodes in the following way(www.ccs3.lanl.gov/pal/publications/papers/petrini01:feng.pdf): have N jobs (N usually 2). While N-1 jobs are using the network, and hence not computing, have 1 job computing. Of course, matching this all up is hard, and most compute jobs typically are sized to use all of memory, so this approach has not been used much. The nodes on the big machines are typically not shared between jobs. BSP was an interesting idea but is not commonly used any more, at least on the systems I know about. Rather, people work hard to overlap communication and computation. ron p.s. for more recent work see: www.cs.unm.edu/~fastos/06meeting/sft.pdf
