BTW: when compiling Brian's change, I got a warning about comparing signed and unsigned. Sure enough, I found that the communicator id is defined as an unsigned int, while the PML is treating it as a *signed* int.
We need to get this corrected - which way do you want it to be? I will add this requirement to the ticket... Thanks Ralph On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <r...@open-mpi.org> wrote: > I'm not entirely sure if David is going to be in today, so I will answer > for him (and let him correct me later!). > > This code is indeed representative of what the app is doing. Basically, the > user repeatedly splits the communicator so he can run mini test cases before > going on to the larger computation. So it is always the base communicator > being repeatedly split and freed. > > I would suspect, therefore, that the quick fix would serve us just fine > while the worst case is later resolved. > > Thanks > Ralph > > > On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote: > >> David, >> >> is this code representative for what your app is doing? E.g. you have a >> base communicator (e.g. MPI_COMM_WORLD) which is being 'split', freed again, >> split, freed again etc. ? i.e. the important aspect is that the same 'base' >> communicator is being used for deriving new communicators again and again? >> >> The reason I ask is two-fold: one, you would in that case be one of the >> ideal beneficiaries of the block cid algorithm :-) (even if it fails you >> right now); two, a fix for this scenario which basically tries to reuse the >> last block used (and which would fix your case if the condition is true) is >> roughly five lines of code. This would give us the possibility to have a fix >> quickly in the trunk and v1.3 (keep in mind that the block-cid code is in >> the trunk since two years and this is the first problem that we have) and >> give us more time to develop a profound solution for the worst case - a >> chain of communicators being created, e.g. communicator 1 is basis to derive >> a new comm 2, comm 2 is being used to derive comm 3 etc. >> >> Thanks >> Edgar >> >> David Gunter wrote: >> >>> Here is the test code reproducer: >>> >>> program test2 >>> implicit none >>> include 'mpif.h' >>> integer ierr, myid, numprocs,i1,i2,n,local_comm, >>> $ icolor,ikey,rank,root >>> >>> c >>> c... MPI set-up >>> ierr = 0 >>> call MPI_INIT(IERR) >>> ierr = 1 >>> CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr) >>> print *, ierr >>> >>> ierr = -1 >>> >>> CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) >>> >>> ierr = -5 >>> i1 = ierr >>> if (myid.eq.0) i1 = 1 >>> call mpi_allreduce(i1, i2, 1,MPI_integer,MPI_MIN, >>> $ MPI_COMM_WORLD,ierr) >>> >>> ikey = myid >>> if (mod(myid,2).eq.0) then >>> icolor = 0 >>> else >>> icolor = MPI_UNDEFINED >>> end if >>> >>> root = 0 >>> do n = 1, 100000 >>> >>> call MPI_COMM_SPLIT(MPI_COMM_WORLD, icolor, >>> $ ikey, local_comm, ierr) >>> >>> if (mod(myid,2).eq.0) then >>> CALL MPI_COMM_RANK(local_comm, rank, ierr) >>> i2 = i1 >>> call mpi_reduce(i1, i2, 1,MPI_integer,MPI_MIN, >>> $ root, local_comm,ierr) >>> >>> if (myid.eq.0.and.mod(n,10).eq.0) >>> $ print *, n, i1, i2,icolor,ikey >>> >>> call mpi_comm_free(local_comm, ierr) >>> end if >>> >>> end do >>> c if (icolor.eq.0) call mpi_comm_free(local_comm, ierr) >>> >>> >>> >>> call MPI_barrier(MPi_COMM_WORLD,ierr) >>> >>> call MPI_FINALIZE(IERR) >>> >>> print *, myid, ierr >>> >>> end >>> >>> >>> >>> -david >>> -- >>> David Gunter >>> HPC-3: Parallel Tools Team >>> Los Alamos National Laboratory >>> >>> >>> >>> On Apr 30, 2009, at 12:43 PM, David Gunter wrote: >>> >>> Just to throw out more info on this, the test code runs fine on previous >>>> versions of OMPI. It only hangs on the 1.3 line when the cid reaches >>>> 65536. >>>> >>>> -david >>>> -- >>>> David Gunter >>>> HPC-3: Parallel Tools Team >>>> Los Alamos National Laboratory >>>> >>>> >>>> >>>> On Apr 30, 2009, at 12:28 PM, Edgar Gabriel wrote: >>>> >>>> cid's are in fact not recycled in the block algorithm. The problem is >>>>> that comm_free is not collective, so you can not make any assumptions >>>>> whether other procs have also released that communicator. >>>>> >>>>> >>>>> But nevertheless, a cid in the communicator structure is a uint32_t, so >>>>> it should not hit the 16k limit there yet. this is not new, so if there >>>>> is a >>>>> discrepancy between what the comm structure assumes that a cid is and what >>>>> the pml assumes, than this was in the code since the very first days of >>>>> Open >>>>> MPI... >>>>> >>>>> Thanks >>>>> Edgar >>>>> >>>>> Brian W. Barrett wrote: >>>>> >>>>>> On Thu, 30 Apr 2009, Ralph Castain wrote: >>>>>> >>>>>>> We seem to have hit a problem here - it looks like we are seeing a >>>>>>> built-in limit on the number of communicators one can create in a >>>>>>> program. The program basically does a loop, calling MPI_Comm_split >>>>>>> each >>>>>>> time through the loop to create a sub-communicator, does a reduce >>>>>>> operation on the members of the sub-communicator, and then calls >>>>>>> MPI_Comm_free to release it (this is a minimized reproducer for the >>>>>>> real >>>>>>> code). After 64k times through the loop, the program fails. >>>>>>> >>>>>>> This looks remarkably like a 16-bit index that hits a max value and >>>>>>> then >>>>>>> blocks. >>>>>>> >>>>>>> I have looked at the communicator code, but I don't immediately see >>>>>>> such >>>>>>> a field. Is anyone aware of some other place where we would have a >>>>>>> limit >>>>>>> that would cause this problem? >>>>>>> >>>>>> There's a maximum of 32768 communicator ids when using OB1 (each PML >>>>>> can set the max contextid, although the communicator code is the part >>>>>> that >>>>>> actually assigns a cid). Assuming that comm_free is actually properly >>>>>> called, there should be plenty of cids available for that pattern. >>>>>> However, >>>>>> I'm not sure I understand the block algorithm someone added to cid >>>>>> allocation - I'd have to guess that there's something funny with that >>>>>> routine and cids aren't being recycled properly. >>>>>> Brian >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>> >>>>> -- >>>>> Edgar Gabriel >>>>> Assistant Professor >>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu >>>>> Department of Computer Science University of Houston >>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> -- >> Edgar Gabriel >> Assistant Professor >> Parallel Software Technologies Lab http://pstl.cs.uh.edu >> Department of Computer Science University of Houston >> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA >> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >