On 2008-09-10, David Roundy <[EMAIL PROTECTED]> wrote: > On Wed, Sep 10, 2008 at 03:30:50PM +0200, Jed Brown wrote: >> On Wed 2008-09-10 09:05, David Roundy wrote: >> > I should point out, however, that in my experience MPI programming >> > involves deadlocks and synchronization handling that are at least as >> > nasty as any I've run into doing shared-memory threading. >> >> Absolutely, avoiding deadlock is the first priority (before error >> handling). If you use the non-blocking interface, you have to be very >> conscious of whether a buffer is being used or the call has completed. >> Regardless, the API requires the programmer to maintain a very clear >> distinction between locally owned and remote memory. > > Even with the blocking interface, you had subtle bugs that I found > pretty tricky to deal with. e.g. the reduce functions in lam3 (or was > it lam4) at one point didn't actually manage to result in the same > values on all nodes (with differences caused by roundoff error), which > led to rare deadlocks, when it so happened that two nodes disagreed as > to when a loop was completed. Perhaps someone made the mistake of > assuming that addition was associative, or maybe it was something > triggered by the non-IEEE floating point we were using. But in any > case, it was pretty nasty. And it was precisely the kind of bug that > won't show up except when you're doing something like MPI where you > are pretty much forced to assume that the same (pure!) computation has > the same effect on each node.
Ah, okay. I think that's a real edge case, and probably not how most use MPI. I've used both threads and MPI; MPI, while cumbersome, never gave me any hard-to-debug deadlock problems. -- Aaron Denney -><- _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe