Tested -- seem to work for me. I say we now let MTT sort it out (i.e., see if others hit this race condition) and apply to v1.3.

On Jun 9, 2009, at 4:46 AM, Ralph Castain wrote:

I don't think it would be very hard - I would have to create a patch
for it, but the fix is completely contained in one file and location.

I would like to have someone else test it, though, before we move it
across. It worked for me, but since it is a race condition, that isn't
entirely convincing.


On Jun 9, 2009, at 5:41 AM, Jeff Squyres wrote:

> I'd be in favor of bringing this to v1.3.  Are there other
> dependencies / would it be difficult?
>
>
> Begin forwarded message:
>
>> From: "Open MPI" <b...@open-mpi.org>
>> Date: June 8, 2009 11:31:20 AM PDT
>> Cc: <b...@osl.iu.edu>
>> Subject: Re: [Open MPI] #1927: v1.3 COMM_SPAWN loop test fails
>> after ~120 spawns
>>
>> #1927: v1.3 COMM_SPAWN loop test fails after ~120 spawns
>> -----------------------
>> +----------------------------------------------------
>> Reporter:  jsquyres    |        Owner:  rhc
>>    Type:  defect      |       Status:  closed
>> Priority:  critical    |    Milestone:  Open MPI 1.3.4
>> Version:  1.3 branch  |   Resolution:  fixed
>> Keywords:              |
>> -----------------------
>> +----------------------------------------------------
>> Changes (by rhc):
>>
>>  * status:  new => closed
>>  * resolution:  => fixed
>>
>>
>> Comment:
>>
>> This was due to a very tight loop on comm_spawn not giving enough
>> time for
>> the prior proc to completely terminate (and thus free its file
>> descriptors) before the next proc was launched. Eventually, we
>> built up a
>> backlog of terminations to process and ran out of fd's.
>>
>> We introduced a check-and-delay in the code that detects we don't
>> have
>> enough fd's to launch another proc, and then waits a second to see if
>> enough become free before aborting.
>>
>> Fixed in trunk - can see if we want to bring it to 1.3.
>>
>> --
>> Ticket URL: <https://svn.open-mpi.org/trac/ompi/ticket/ 1927#comment:
>> 3>
>> Open MPI <http://www.open-mpi.org/>
>>
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Reply via email to