Heard you the first time Gleb, just been backed up with other stuff.
Following is the code:
include "mpif.h"
character(20) cmd_line_arg ! We'll use the first command-line
argument
! to set the duration of the test.
real(8) :: duration = 10 ! The default duration (in seconds)
can be
! set here.
real(8) :: endtime ! This is the time at which we'll end the
! test.
integer(8) :: nmsgs = 1 ! We'll count the number of messages sent
! out from each MPI process. There
will be
! at least one message (at the very end),
! and we'll count all the others.
logical :: keep_going = .true. ! This flag says whether to keep going.
! Initialize MPI stuff.
call MPI_Init(ier)
call MPI_Comm_rank(MPI_COMM_WORLD, me, ier)
call MPI_Comm_size(MPI_COMM_WORLD, np, ier)
if ( np == 1 ) then
! Test to make sure there is at least one other process.
write(6,*) "Need at least 2 processes."
write(6,*) "Try resubmitting the job with"
write(6,*) " 'mpirun -np <np>'"
write(6,*) "where <np> is at least 2."
else if ( me == 0 ) then
! The first command-line argument is the duration of the test
(seconds).
call get_command_argument(1,cmd_line_arg,len,istat)
if ( istat == 0 ) read(cmd_line_arg,*) duration
! Loop until test is done.
endtime = MPI_Wtime() + duration ! figure out when to end
do while ( MPI_Wtime() < endtime )
call MPI_Send(keep_going,1,MPI_LOGICAL,1,1,MPI_COMM_WORLD,ier)
nmsgs = nmsgs + 1
end do
! Then, send the closing signal.
keep_going = .false.
call MPI_Send(keep_going,1,MPI_LOGICAL,1,1,MPI_COMM_WORLD,ier)
! Write summary information.
write(6,'("Target duration (seconds):",f18.6)') duration
write(6,'("# of messages sent in that time:", i12)') nmsgs
write(6,'("Microseconds per message:", f19.3)') 1.d6 * duration /
nmsgs
else
! If you're not Process 0, you need to receive messages
! (and possibly relay them onward).
do while ( keep_going )
call MPI_Recv(keep_going,1,MPI_LOGICAL,me-1,1,MPI_COMM_WORLD, &
MPI_STATUS_IGNORE,ier)
if ( me == np - 1 ) cycle ! The last process only receives
messages.
call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1,MPI_COMM_WORLD,ier)
end do
end if
! Finalize.
call MPI_Finalize(ier)
end
Sorry it is in Fortran.
--td
Gleb Natapov wrote:
On Wed, Aug 29, 2007 at 11:01:14AM -0400, Richard Graham wrote:
If you are going to look at it, I will not bother with this.
I need the code to reproduce the problem. Otherwise I have nothing to
look at.
Rich
On 8/29/07 10:47 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
On Wed, Aug 29, 2007 at 10:46:06AM -0400, Richard Graham wrote:
Gleb,
Are you looking at this ?
Not today. And I need the code to reproduce the bug. Is this possible?
Rich
On 8/29/07 9:56 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote:
Is this trunk or 1.2?
Oops. I should read more carefully :) This is trunk.
On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
I have a program that does a simple bucket brigade of sends and
receives
where rank 0 is the start and repeatedly sends to rank 1 until
a certain
amount of time has passed and then it sends and all done packet.
Running this under np=2 always works. However, when I run with
greater
than 2 using only the SM btl the program usually hangs and one
of the
processes has a long stack that has a lot of the following 3
calls in it:
[25] opal_progress(), line 187 in "opal_progress.c"
[26] mca_btl_sm_component_progress(), line 397 in
"btl_sm_component.c"
[27] mca_bml_r2_progress(), line 110 in "bml_r2.c"
When stepping through the ompi_fifo_write_to_head routine it
looks like
the fifo has overflowed.
I am wondering if what is happening is rank 0 has sent a bunch of
messages that have exhausted the
resources such that one of the middle ranks which is in the
process of
sending cannot send and therefore
never gets to the point of trying to receive the messages from
rank 0?
Is the above a possible scenario or are messages periodically
bled off
the SM BTL's fifos?
Note, I have seen np=3 pass sometimes and I can get it to pass
reliably
if I raise the shared memory space used by the BTL. This is
using the
trunk.
--td
--
Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel