On Sat, 3 Oct 1998, Loic Prylli wrote:

> 
> [message cross-posted to the beowulf list (as well as linux-smp),
> people might be interested there ]
> 
> Hello,
> 
> 
> It seems some people have had problems with reentrant interrupts for
> some time, this is more often seen when using parallel applications on clusters
> with fast networks:

I have been having problems on a Supermicro P6DBE (testing other
boards today).  The problems seem related to this issue. I tried your
patch, but it made no difference :(

My problem, described previously results in lock-ups, wrong answers,
or stalled MPI communication,  when doing lots of communication on a
cluster.  The problem is absent when I have a single copy of the program
running on a single SMP node, but as soon as there are more than one
copy  running and communicating to other nodes the instability starts.

This seems to occur in both 2.0.X kernels and 2.1.X kernels.
With both LAM-MPI, MPICH and with fast Ethernet and Myrinet.

Which leads me to the conclusion that linux SMP is not stable
for my hardware configuration (which is pretty standard).

Doug Eadline
-------------------------------------------------------------------
Paralogic, Inc.           |     PEAK     |      Voice:+610.861.6960
115 Research Drive        |   PARALLEL   |        Fax:+610.861.8247
Bethlehem, PA 18017 USA   |  PERFORMANCE |    http://www.plogic.com
-------------------------------------------------------------------

Reply via email to