On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <r...@open-mpi.org>
wrote:
I'm not entirely sure if David is going to be in today, so I
will answer for him (and let him correct me later!).
This code is indeed representative of what the app is doing.
Basically, the user repeatedly splits the communicator so he
can run mini test cases before going on to the larger
computation. So it is always the base communicator being
repeatedly split and freed.
I would suspect, therefore, that the quick fix would serve us
just fine while the worst case is later resolved.
Thanks
Ralph
On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabr...@cs.uh.edu>
wrote:
David,
is this code representative for what your app is doing?
E.g. you have a base communicator (e.g. MPI_COMM_WORLD)
which is being 'split', freed again, split, freed again
etc. ? i.e. the important aspect is that the same
'base' communicator is being used for deriving new
communicators again and again?
The reason I ask is two-fold: one, you would in that
case be one of the ideal beneficiaries of the block cid
algorithm :-) (even if it fails you right now); two, a
fix for this scenario which basically tries to reuse
the last block used (and which would fix your case if
the condition is true) is roughly five lines of code.
This would give us the possibility to have a fix
quickly in the trunk and v1.3 (keep in mind that the
block-cid code is in the trunk since two years and this
is the first problem that we have) and give us more
time to develop a profound solution for the worst case
- a chain of communicators being created, e.g.
communicator 1 is basis to derive a new comm 2, comm 2
is being used to derive comm 3 etc.
Thanks
Edgar
David Gunter wrote:
Here is the test code reproducer:
program test2
implicit none
include 'mpif.h'
integer ierr, myid,
numprocs,i1,i2,n,local_comm,
$ icolor,ikey,rank,root
c
c... MPI set-up
ierr = 0
call MPI_INIT(IERR)
ierr = 1
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,
numprocs, ierr)
print *, ierr
ierr = -1
CALL MPI_COMM_RANK(MPI_COMM_WORLD,
myid, ierr)
ierr = -5
i1 = ierr
if (myid.eq.0) i1 = 1
call mpi_allreduce(i1, i2,
1,MPI_integer,MPI_MIN,
$ MPI_COMM_WORLD,ierr)
ikey = myid
if (mod(myid,2).eq.0) then
icolor = 0
else
icolor = MPI_UNDEFINED
end if
root = 0
do n = 1, 100000
call MPI_COMM_SPLIT(MPI_COMM_WORLD,
icolor,
$ ikey, local_comm, ierr)
if (mod(myid,2).eq.0) then
CALL MPI_COMM_RANK(local_comm,
rank, ierr)
i2 = i1
call mpi_reduce(i1, i2,
1,MPI_integer,MPI_MIN,
$ root, local_comm,ierr)
if
(myid.eq.0.and.mod(n,10).eq.0)
$ print *, n, i1,
i2,icolor,ikey
call mpi_comm_free(local_comm,
ierr)
end if
end do
c if (icolor.eq.0) call
mpi_comm_free(local_comm, ierr)
call MPI_barrier(MPi_COMM_WORLD,ierr)
call MPI_FINALIZE(IERR)
print *, myid, ierr
end
-david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory
On Apr 30, 2009, at 12:43 PM, David Gunter
wrote:
Just to throw out more info on
this, the test code runs fine
on previous versions of OMPI.
It only hangs on the 1.3 line
when the cid reaches 65536.
-david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory
On Apr 30, 2009, at 12:28 PM,
Edgar Gabriel wrote:
cid's are in fact
not recycled in the
block algorithm.
The problem is that
comm_free is not
collective, so you
can not make any
assumptions whether
other procs have
also released that
communicator.
But nevertheless, a
cid in the
communicator
structure is a
uint32_t, so it
should not hit the
16k limit there
yet. this is not
new, so if there is
a discrepancy
between what the
comm structure
assumes that a cid
is and what the pml
assumes, than this
was in the code
since the very
first days of Open
MPI...
Thanks
Edgar
Brian W. Barrett
wrote:
On Thu,
30 Apr
2009,
Ralph
Castain
wrote:
We
seem
to
have
hit
a
problem
here
-
it
looks
like
we
are
seeing
a
built-in
limit
on
the
number
of
communicators
one
can
create
in
a
program.
The
program
basically
does
a
loop,
calling
MPI_Comm_split
each
time
through
the
loop
to
create
a
sub-communicator,
does
a
reduce
operation
on
the
members
of
the
sub-communicator,
and
then
calls
MPI_Comm_free
to
release
it
(this
is
a
minimized
reproducer
for
the
real
code).
After
64k
times
through
the
loop,
the
program
fails.
This
looks
remarkably
like
a
16-bit
index
that
hits
a
max
value
and
then
blocks.
I
have
looked
at
the
communicator
code,
but
I
don't
immediately
see
such
a
field.
Is
anyone
aware
of
some
other
place
where
we
would
have
a
limit
that
would
cause
this
problem?
There's
a
maximum
of
32768
communicator
ids
when
using
OB1
(each
PML can
set the
max
contextid,
although
the
communicator
code is
the
part
that
actually
assigns
a cid).
Assuming
that
comm_free
is
actually
properly
called,
there
should
be
plenty
of cids
available
for
that
pattern.
However,
I'm not
sure I
understand
the
block
algorithm
someone
added
to cid
allocation
- I'd
have to
guess
that
there's
something
funny
with
that
routine
and
cids
aren't
being
recycled
properly.
Brian
_______________________________________________
devel
mailing
list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Edgar Gabriel
Assistant Professor
Parallel Software
Technologies
Lab http://pstl.cs.uh.edu
Department of
Computer
Science University
of Houston
Philip G. Hoffman
Hall, Room
524 Houston,
TX-77204, USA
Tel: +1 (713)
743-3857 Fax: +1
(713) 743-3335
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of
Houston
Philip G. Hoffman Hall, Room 524 Houston,
TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713)
743-3335
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel