On Mon, Jul 28, 2008 at 12:08 PM, Terry Dontje <terry.don...@sun.com> wrote:
> Jeff Squyres wrote: > >> On Jul 28, 2008, at 12:03 PM, George Bosilca wrote: >> >> Interesting. The self is only used for local communications. I don't >>> expect that any benchmark execute such communications, but apparently I was >>> wrong. Please let me know the failing test, I will take a look this evening. >>> >> >> FWIW, my manual tests of a simplistic "ring" program work for all >> combinations (openib, openib+self, openib+self+sm). Shrug. >> >> But for OSU latency, I found that openib, openib+sm work, but >> openib+sm+self hangs (same results whether the 2 procs are on the same node >> or different nodes). There is no self communication in osu_latency, so >> something else must be going on. >> >> Is it something to do with the MPI_Barrier call? osu_latency uses > MPI_Barrier and from rhc's email it sounds like his code does too. I don't think it's an issue with MPI_Barrier(). I'm running into this problem with srtest.c (one of the example programs from the mpich distribution). It's a ring-type test with no barriers until the end, yet it hangs on the very first Send/Recv pair from rank0 to rank1. I my case, openib and openib+sm works, but openib+self & openib+sm+self hang. --brad > > --td > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >