Ralph,

Can you point me to the small reproducer for the libevent bug that was
fixed? I'm just curious.

Thanks,
Josh

On Fri, May 4, 2012 at 4:09 PM, Ralph Castain <r...@open-mpi.org> wrote:
> FYI: 2.0.19-stable was released yesterday. I have a Mercurial repo all set to 
> go:
>
> https://bitbucket.org/rhc/ompi-libevent2019
>
> Please check it out - timeout is now set for May 11th.
>
> Thanks
> Ralph
>
> On May 1, 2012, at 8:38 AM, Ralph Castain wrote:
>
>> WHAT:  Update libevent to 2.0.19 release
>>
>> WHEN:  As soon as it is released, expected around May 11
>>
>> WHY:     The 2.0.19 release contains a critical fix to a bug I recently 
>> discovered in the libevent 2.0.x series
>>
>>
>> Details:
>> I discovered a bug in libevent over the last few days that causes it to 
>> unexpectedly "invert" event priorities. It is a slightly subtle bug, but we 
>> were able to provide a simple reproducer and so the libevent folks were able 
>> to quickly implement a fix.
>>
>> Stated simply, if you were in an event of a given priority and activated an 
>> event of higher priority, that new event would not get serviced if any event 
>> of the current priority were to become active prior to leaving the current 
>> event. In other words, libevent would service all active events of the 
>> current priority before even looking to see if a higher priority event was 
>> active.
>>
>> The patch adds the following logic to event_active:
>>
>>>   IF <I am in an event> AND
>>>       IF <ev->base> EQ <current-base> AND
>>>       IF <pri> LT <current-pri>  THEN
>>>           <rescan queues on next loop>
>>
>>
>> Thus, a rescan only occurs if a higher priority event becomes active during 
>> an event of lower priority. Unfortunately, ORTE relies on this behavior to 
>> handle errors - without the change, an error reported in a message from a 
>> daemon (for example) cannot be serviced until ALL messages that arrive 
>> during the processing of the message have been handled. In the case of a 
>> large cluster that is receiving a long list of messages, this prevents the 
>> error from being handled for quite some time.
>>
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

Reply via email to