I've been digging further into this, and I believe I have much of it resolved 
now. However, I have encountered a problem that appears to be something in 
libevent itself.

I configured libevent with debug enabled, and turned it on at execution - and 
was barraged by:

[warn] select: Invalid argument

Digging further into the reason, I found that the message comes from the 
following code in select_dispatch (file select.c):

        res = select(nfds, sop->event_readset_out,
            sop->event_writeset_out, NULL, tv);

        EVBASE_ACQUIRE_LOCK(base, th_base_lock);

        check_selectop(sop);

        if (res == -1) {
                if (errno != EINTR) {
                        event_warn("select");
                        return (-1);
                }

                return (0);
        }

The timeout value being supplied to select_dispatch is being corrupted after 
the first time thru the routine - it comes into the routine the first time as 
{0, 0}, but is an illegal value thereafter. Resetting the timeout to the 
original value resolves the problem.

Obviously, removing debug "quiets" the message barrage - but I wonder if 
something else is going on here, or if there is a bug in libevent itself?

Thanks
Ralph


On Jan 6, 2012, at 12:47 PM, Ralph Castain wrote:

> Afraid I'm going to have to eat my words here, Nick. It looks like something 
> is going on in the code - not entirely sure just where yet (mine or 
> libevent). I've installed a clean version of 2.0.13 (removing everything but 
> the glue) into OMPI, and the problems persist. I've also tried converting to 
> a true fd-based event using pipes, and get the identical behavior.
> 
> I'm going to spend some more time over the weekend looking at this before 
> begging more of your time on it. I'm hoping to pin it down a little more for 
> you, or at least provide an updated reproducer.
> 
> Thanks again
> Ralph
> 
> On Jan 6, 2012, at 8:15 AM, Ralph Castain wrote:
> 
>> 
>> On Jan 6, 2012, at 7:02 AM, Nick Mathewson wrote:
>> 
>>> On Fri, Jan 6, 2012 at 7:24 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> If it helps, I have now confirmed that I *can* activate the t2 event 
>>>> during the t1func callback in my example *provided* I called event_assign 
>>>> on it prior to entering event_base_loop. It is also okay for me to 
>>>> event_add the t2 event during the callback - I am simply not allowed to 
>>>> event_assign *and* activate it there.
>>>> 
>>>> However, it is okay to assign the event during the callback so long as I 
>>>> don't activate it until after I return.
>>>> 
>>>> Seems a little strange to me - is this the intended behavior?
>>> 
>>> Well, no, of course not.
>>> 
>>> Looking at your code, the only weird thing I see at first glance is
>>> that you are calling event_add() on t1 and t2 -- you shouldn't be
>>> doing that.  event_add() is only for events that you want libevent to
>>> poll or wait for, but waiting for EV_WRITE on fd -1 isn't
>>> well-defined.  If you want to activate them yourself with
>>> event_active(), there's no need to event_add() them.
>>> 
>>> That shouldn't be causing this problem, though, I think.  (Unless it is?)
>> 
>> BINGO! Indeed, event_add was the source of the trouble. My bad for not 
>> understanding when event_add was required.
>> 
>>> 
>>> I just tried your test programs, though, and they worked okay for me
>>> on OSX and linux, using Libevent 2.0.13-stable and Libevent
>>> 2.0.14-stable.
>>> 
>>> What platform are you running your tests on?  Have you tried other
>>> platforms too?  Does the outcome depend on which libevent backend is
>>> in use?  Have you tried this with an unpatched Libevent, just to
>>> confirm that it's not introduced by any openmpi patches?
>> 
>> FWICT, it is a corruption issue, and so it does indeed depend on platform 
>> and backend - just a question of what memory location gets trounced.
>> 
>> FWIW, I was conducting my tests on OSX and linux as well, using OMPI with 
>> 2.0.13 underneath. I think the difference in our results is due to the 
>> location issue - I suspect that you might also hit a problem if we continued 
>> chaining events long enough, but I haven't confirmed it.
>> 
>> Also fwiw: the OMPI changes are confined to configuration/Makefile areas - 
>> we actually don't fiddle with the libevent code itself other than a couple 
>> of places where we test for stdbool.h before including it.
>> 
>> Thanks Nick!
>> Ralph
>> 
>>> 
>>> yrs,
>>> -- 
>>> Nick
>>> ***********************************************************************
>>> To unsubscribe, send an e-mail to majord...@freehaven.net with
>>> unsubscribe libevent-users    in the body.
>> 
> 

***********************************************************************
To unsubscribe, send an e-mail to majord...@freehaven.net with
unsubscribe libevent-users    in the body.

Reply via email to