Thanks Edgar!! Much appreciated...

On May 2, 2009, at 12:08 PM, Edgar Gabriel wrote:

ok, r21142 should fix the problem for the app. I did test it with a number of scenarios (e.g. all intra-comm cases, inter-comm cases, intercomm_merge etc.), but I would suggest to let at least one night of MTT runs go over it before we file a CMR for 1.3 ...

Thanks
Edgar


On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <r...@open-mpi.org> wrote:
     I'm not entirely sure if David is going to be in today, so I
     will answer for him (and let him correct me later!).

     This code is indeed representative of what the app is doing.
     Basically, the user repeatedly splits the communicator so he
     can run mini test cases before going on to the larger
     computation. So it is always the base communicator being
     repeatedly split and freed.

     I would suspect, therefore, that the quick fix would serve us
     just fine while the worst case is later resolved.

     Thanks
     Ralph


On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabr...@cs.uh.edu>
wrote:
     David,

     is this code representative for what your app is doing?
     E.g. you have a base communicator (e.g. MPI_COMM_WORLD)
     which is being 'split', freed again, split, freed again
     etc. ? i.e. the important aspect is that the same
     'base' communicator is being used for deriving new
     communicators again and again?

     The reason I ask is two-fold: one, you would in that
     case be one of the ideal beneficiaries of the block cid
     algorithm :-) (even if it fails you right now);  two, a
     fix for this scenario which basically tries to reuse
     the last block used (and which would fix your case if
     the condition is true) is roughly five lines of code.
     This would give us the possibility to have a fix
     quickly in the trunk and v1.3 (keep in mind that the
     block-cid code is in the trunk since two years and this
     is the first problem that we have) and give us more
     time to develop a profound solution for the worst case
     - a chain of communicators being created, e.g.
     communicator 1 is basis to derive a new comm 2, comm 2
     is being used to derive comm 3 etc.

     Thanks
     Edgar

     David Gunter wrote:
           Here is the test code reproducer:

                program test2
                implicit none
                include 'mpif.h'
                integer ierr, myid,
           numprocs,i1,i2,n,local_comm,
               $     icolor,ikey,rank,root

           c
           c...  MPI set-up
                ierr = 0
                call MPI_INIT(IERR)
                ierr = 1
                CALL MPI_COMM_SIZE(MPI_COMM_WORLD,
           numprocs, ierr)
                print *, ierr

                ierr = -1

                CALL MPI_COMM_RANK(MPI_COMM_WORLD,
           myid, ierr)

                ierr = -5
                i1 = ierr
                if (myid.eq.0) i1 = 1
                call mpi_allreduce(i1, i2,
           1,MPI_integer,MPI_MIN,
               $     MPI_COMM_WORLD,ierr)

                ikey = myid
                if (mod(myid,2).eq.0) then
                   icolor = 0
                else
                   icolor = MPI_UNDEFINED
                end if

                root = 0
                do n = 1, 100000

                   call MPI_COMM_SPLIT(MPI_COMM_WORLD,
           icolor,
               $        ikey, local_comm, ierr)

                   if (mod(myid,2).eq.0) then
                      CALL MPI_COMM_RANK(local_comm,
           rank, ierr)
                      i2 = i1
                      call mpi_reduce(i1, i2,
           1,MPI_integer,MPI_MIN,
               $           root, local_comm,ierr)

                      if
           (myid.eq.0.and.mod(n,10).eq.0)
               $           print *, n, i1,
           i2,icolor,ikey

                      call mpi_comm_free(local_comm,
           ierr)
                   end if

                end do
           c      if (icolor.eq.0) call
           mpi_comm_free(local_comm, ierr)



                call MPI_barrier(MPi_COMM_WORLD,ierr)

                call MPI_FINALIZE(IERR)

                print *, myid, ierr

                end



           -david
           --
           David Gunter
           HPC-3: Parallel Tools Team
           Los Alamos National Laboratory



           On Apr 30, 2009, at 12:43 PM, David Gunter
           wrote:

                 Just to throw out more info on
                 this, the test code runs fine
                 on previous versions of OMPI.
                  It only hangs on the 1.3 line
                 when the cid reaches 65536.

                 -david
                 --
                 David Gunter
                 HPC-3: Parallel Tools Team
                 Los Alamos National Laboratory



                 On Apr 30, 2009, at 12:28 PM,
                 Edgar Gabriel wrote:

                       cid's are in fact
                       not recycled in the
                       block algorithm.
                       The problem is that
                       comm_free is not
                       collective, so you
                       can not make any
                       assumptions whether
                       other procs have
                       also released that
                       communicator.


                       But nevertheless, a
                       cid in the
                       communicator
                       structure is a
                       uint32_t, so it
                       should not hit the
                       16k limit there
                       yet. this is not
                       new, so if there is
                       a discrepancy
                       between what the
                       comm structure
                       assumes that a cid
                       is and what the pml
                       assumes, than this
                       was in the code
                       since the very
                       first days of Open
                       MPI...

                       Thanks
                       Edgar

                       Brian W. Barrett
                       wrote:
                             On Thu,
                             30 Apr
                             2009,
                             Ralph
                             Castain
                             wrote:
                                   We
                                   seem
                                   to
                                   have
                                   hit
                                   a
                                   problem
                                   here
                                   -
                                   it
                                   looks
                                   like
                                   we
                                   are
                                   seeing
                                   a
                                   built-in
                                   limit
                                   on
                                   the
                                   number
                                   of
                                   communicators
                                   one
                                   can
                                   create
                                   in
                                   a
                                   program.
                                   The
                                   program
                                   basically
                                   does
                                   a
                                   loop,
                                   calling
                                   MPI_Comm_split
                                   each
                                   time
                                   through
                                   the
                                   loop
                                   to
                                   create
                                   a
                                   sub-communicator,
                                   does
                                   a
                                   reduce
                                   operation
                                   on
                                   the
                                   members
                                   of
                                   the
                                   sub-communicator,
                                   and
                                   then
                                   calls
                                   MPI_Comm_free
                                   to
                                   release
                                   it
                                   (this
                                   is
                                   a
                                   minimized
                                   reproducer
                                   for
                                   the
                                   real
                                   code).
                                   After
                                   64k
                                   times
                                   through
                                   the
                                   loop,
                                   the
                                   program
                                   fails.

                                   This
                                   looks
                                   remarkably
                                   like
                                   a
                                   16-bit
                                   index
                                   that
                                   hits
                                   a
                                   max
                                   value
                                   and
                                   then
                                   blocks.

                                   I
                                   have
                                   looked
                                   at
                                   the
                                   communicator
                                   code,
                                   but
                                   I
                                   don't
                                   immediately
                                   see
                                   such
                                   a
                                   field.
                                   Is
                                   anyone
                                   aware
                                   of
                                   some
                                   other
                                   place
                                   where
                                   we
                                   would
                                   have
                                   a
                                   limit
                                   that
                                   would
                                   cause
                                   this
                                   problem?

                             There's
                             a
                             maximum
                             of
                             32768
                             communicator
                             ids
                             when
                             using
                             OB1
                             (each
                             PML can
                             set the
                             max
                             contextid,
                             although
                             the
                             communicator
                             code is
                             the
                             part
                             that
                             actually
                             assigns
                             a cid).
                              Assuming
                             that
                             comm_free
                             is
                             actually
                             properly
                             called,
                             there
                             should
                             be
                             plenty
                             of cids
                             available
                             for
                             that
                             pattern.
                             However,
                             I'm not
                             sure I
                             understand
                             the
                             block
                             algorithm
                             someone
                             added
                             to cid
                             allocation
                             - I'd
                             have to
                             guess
                             that
                             there's
                             something
                             funny
                             with
                             that
                             routine
                             and
                             cids
                             aren't
                             being
                             recycled
                             properly.
                             Brian
_______________________________________________
                             devel
                             mailing
                             list
                             de...@open-mpi.org
                             http://www.open-mpi.org/mailman/listinfo.cgi/devel


                       --
                       Edgar Gabriel
                       Assistant Professor
                       Parallel Software
Technologies Lab http://pstl.cs.uh.edu
                       Department of
Computer Science University
                       of Houston
                       Philip G. Hoffman
Hall, Room 524 Houston,
                       TX-77204, USA
                       Tel: +1 (713)
743-3857 Fax: +1
                       (713) 743-3335
_______________________________________________
                       devel mailing list
                       de...@open-mpi.org
                       http://www.open-mpi.org/mailman/listinfo.cgi/devel


                 _______________________________________________
                 devel mailing list
                 de...@open-mpi.org
                 http://www.open-mpi.org/mailman/listinfo.cgi/devel


           _______________________________________________
           devel mailing list
           de...@open-mpi.org
           http://www.open-mpi.org/mailman/listinfo.cgi/devel


     --
     Edgar Gabriel
     Assistant Professor
     Parallel Software Technologies Lab           http://pstl.cs.uh.edu
     Department of Computer Science          University of
     Houston
     Philip G. Hoffman Hall, Room 524        Houston,
     TX-77204, USA
     Tel: +1 (713) 743-3857                  Fax: +1 (713)
     743-3335
     _______________________________________________
     devel mailing list
     de...@open-mpi.org
     http://www.open-mpi.org/mailman/listinfo.cgi/devel





------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to