BTW: when compiling Brian's change, I got a warning about comparing signed
and unsigned. Sure enough, I found that the communicator id is defined as an
unsigned int, while the PML is treating it as a *signed* int.

We need to get this corrected - which way do you want it to be?

I will add this requirement to the ticket...

Thanks
Ralph


On Fri, May 1, 2009 at 6:38 AM, Ralph Castain <r...@open-mpi.org> wrote:

> I'm not entirely sure if David is going to be in today, so I will answer
> for him (and let him correct me later!).
>
> This code is indeed representative of what the app is doing. Basically, the
> user repeatedly splits the communicator so he can run mini test cases before
> going on to the larger computation. So it is always the base communicator
> being repeatedly split and freed.
>
> I would suspect, therefore, that the quick fix would serve us just fine
> while the worst case is later resolved.
>
> Thanks
> Ralph
>
>
> On Fri, May 1, 2009 at 6:08 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
>
>> David,
>>
>> is this code representative for what your app is doing? E.g. you have a
>> base communicator (e.g. MPI_COMM_WORLD) which is being 'split', freed again,
>> split, freed again etc. ? i.e. the important aspect is that the same 'base'
>> communicator is being used for deriving new communicators again and again?
>>
>> The reason I ask is two-fold: one, you would in that case be one of the
>> ideal beneficiaries of the block cid algorithm :-) (even if it fails you
>> right now);  two, a fix for this scenario which basically tries to reuse the
>> last block used (and which would fix your case if the condition is true) is
>> roughly five lines of code. This would give us the possibility to have a fix
>> quickly in the trunk and v1.3 (keep in mind that the block-cid code is in
>> the trunk since two years and this is the first problem that we have) and
>> give us more time to develop a profound solution for the worst case - a
>> chain of communicators being created, e.g. communicator 1 is basis to derive
>> a new comm 2, comm 2 is being used to derive comm 3 etc.
>>
>> Thanks
>> Edgar
>>
>> David Gunter wrote:
>>
>>> Here is the test code reproducer:
>>>
>>>      program test2
>>>      implicit none
>>>      include 'mpif.h'
>>>      integer ierr, myid, numprocs,i1,i2,n,local_comm,
>>>     $     icolor,ikey,rank,root
>>>
>>> c
>>> c...  MPI set-up
>>>      ierr = 0
>>>      call MPI_INIT(IERR)
>>>      ierr = 1
>>>      CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
>>>      print *, ierr
>>>
>>>      ierr = -1
>>>
>>>      CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
>>>
>>>      ierr = -5
>>>      i1 = ierr
>>>      if (myid.eq.0) i1 = 1
>>>      call mpi_allreduce(i1, i2, 1,MPI_integer,MPI_MIN,
>>>     $     MPI_COMM_WORLD,ierr)
>>>
>>>      ikey = myid
>>>      if (mod(myid,2).eq.0) then
>>>         icolor = 0
>>>      else
>>>         icolor = MPI_UNDEFINED
>>>      end if
>>>
>>>      root = 0
>>>      do n = 1, 100000
>>>
>>>         call MPI_COMM_SPLIT(MPI_COMM_WORLD, icolor,
>>>     $        ikey, local_comm, ierr)
>>>
>>>         if (mod(myid,2).eq.0) then
>>>            CALL MPI_COMM_RANK(local_comm, rank, ierr)
>>>            i2 = i1
>>>            call mpi_reduce(i1, i2, 1,MPI_integer,MPI_MIN,
>>>     $           root, local_comm,ierr)
>>>
>>>            if (myid.eq.0.and.mod(n,10).eq.0)
>>>     $           print *, n, i1, i2,icolor,ikey
>>>
>>>            call mpi_comm_free(local_comm, ierr)
>>>         end if
>>>
>>>      end do
>>> c      if (icolor.eq.0) call mpi_comm_free(local_comm, ierr)
>>>
>>>
>>>
>>>      call MPI_barrier(MPi_COMM_WORLD,ierr)
>>>
>>>      call MPI_FINALIZE(IERR)
>>>
>>>      print *, myid, ierr
>>>
>>>      end
>>>
>>>
>>>
>>> -david
>>> --
>>> David Gunter
>>> HPC-3: Parallel Tools Team
>>> Los Alamos National Laboratory
>>>
>>>
>>>
>>> On Apr 30, 2009, at 12:43 PM, David Gunter wrote:
>>>
>>>  Just to throw out more info on this, the test code runs fine on previous
>>>> versions of OMPI.  It only hangs on the 1.3 line when the cid reaches 
>>>> 65536.
>>>>
>>>> -david
>>>> --
>>>> David Gunter
>>>> HPC-3: Parallel Tools Team
>>>> Los Alamos National Laboratory
>>>>
>>>>
>>>>
>>>> On Apr 30, 2009, at 12:28 PM, Edgar Gabriel wrote:
>>>>
>>>>  cid's are in fact not recycled in the block algorithm. The problem is
>>>>> that comm_free is not collective, so you can not make any assumptions
>>>>> whether other procs have also released that communicator.
>>>>>
>>>>>
>>>>> But nevertheless, a cid in the communicator structure is a uint32_t, so
>>>>> it should not hit the 16k limit there yet. this is not new, so if there 
>>>>> is a
>>>>> discrepancy between what the comm structure assumes that a cid is and what
>>>>> the pml assumes, than this was in the code since the very first days of 
>>>>> Open
>>>>> MPI...
>>>>>
>>>>> Thanks
>>>>> Edgar
>>>>>
>>>>> Brian W. Barrett wrote:
>>>>>
>>>>>> On Thu, 30 Apr 2009, Ralph Castain wrote:
>>>>>>
>>>>>>> We seem to have hit a problem here - it looks like we are seeing a
>>>>>>> built-in limit on the number of communicators one can create in a
>>>>>>> program. The program basically does a loop, calling MPI_Comm_split
>>>>>>> each
>>>>>>> time through the loop to create a sub-communicator, does a reduce
>>>>>>> operation on the members of the sub-communicator, and then calls
>>>>>>> MPI_Comm_free to release it (this is a minimized reproducer for the
>>>>>>> real
>>>>>>> code). After 64k times through the loop, the program fails.
>>>>>>>
>>>>>>> This looks remarkably like a 16-bit index that hits a max value and
>>>>>>> then
>>>>>>> blocks.
>>>>>>>
>>>>>>> I have looked at the communicator code, but I don't immediately see
>>>>>>> such
>>>>>>> a field. Is anyone aware of some other place where we would have a
>>>>>>> limit
>>>>>>> that would cause this problem?
>>>>>>>
>>>>>> There's a maximum of 32768 communicator ids when using OB1 (each PML
>>>>>> can set the max contextid, although the communicator code is the part 
>>>>>> that
>>>>>> actually assigns a cid).  Assuming that comm_free is actually properly
>>>>>> called, there should be plenty of cids available for that pattern. 
>>>>>> However,
>>>>>> I'm not sure I understand the block algorithm someone added to cid
>>>>>> allocation - I'd have to guess that there's something funny with that
>>>>>> routine and cids aren't being recycled properly.
>>>>>> Brian
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>> --
>>>>> Edgar Gabriel
>>>>> Assistant Professor
>>>>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>>>>> Department of Computer Science          University of Houston
>>>>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>>>>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> --
>> Edgar Gabriel
>> Assistant Professor
>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>> Department of Computer Science          University of Houston
>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>

Reply via email to