Gleb, Are you looking at this ? Rich
On 8/29/07 9:56 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote: >> Is this trunk or 1.2? > Oops. I should read more carefully :) This is trunk. > >> >> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: >>> I have a program that does a simple bucket brigade of sends and receives >>> where rank 0 is the start and repeatedly sends to rank 1 until a certain >>> amount of time has passed and then it sends and all done packet. >>> >>> Running this under np=2 always works. However, when I run with greater >>> than 2 using only the SM btl the program usually hangs and one of the >>> processes has a long stack that has a lot of the following 3 calls in it: >>> >>> [25] opal_progress(), line 187 in "opal_progress.c" >>> [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" >>> [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" >>> >>> When stepping through the ompi_fifo_write_to_head routine it looks like >>> the fifo has overflowed. >>> >>> I am wondering if what is happening is rank 0 has sent a bunch of >>> messages that have exhausted the >>> resources such that one of the middle ranks which is in the process of >>> sending cannot send and therefore >>> never gets to the point of trying to receive the messages from rank 0? >>> >>> Is the above a possible scenario or are messages periodically bled off >>> the SM BTL's fifos? >>> >>> Note, I have seen np=3 pass sometimes and I can get it to pass reliably >>> if I raise the shared memory space used by the BTL. This is using the >>> trunk. >>> >>> >>> --td >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Gleb. >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Gleb. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel