I've been digging further into this, and I believe I have much of it resolved now. However, I have encountered a problem that appears to be something in libevent itself.
I configured libevent with debug enabled, and turned it on at execution - and was barraged by: [warn] select: Invalid argument Digging further into the reason, I found that the message comes from the following code in select_dispatch (file select.c): res = select(nfds, sop->event_readset_out, sop->event_writeset_out, NULL, tv); EVBASE_ACQUIRE_LOCK(base, th_base_lock); check_selectop(sop); if (res == -1) { if (errno != EINTR) { event_warn("select"); return (-1); } return (0); } The timeout value being supplied to select_dispatch is being corrupted after the first time thru the routine - it comes into the routine the first time as {0, 0}, but is an illegal value thereafter. Resetting the timeout to the original value resolves the problem. Obviously, removing debug "quiets" the message barrage - but I wonder if something else is going on here, or if there is a bug in libevent itself? Thanks Ralph On Jan 6, 2012, at 12:47 PM, Ralph Castain wrote: > Afraid I'm going to have to eat my words here, Nick. It looks like something > is going on in the code - not entirely sure just where yet (mine or > libevent). I've installed a clean version of 2.0.13 (removing everything but > the glue) into OMPI, and the problems persist. I've also tried converting to > a true fd-based event using pipes, and get the identical behavior. > > I'm going to spend some more time over the weekend looking at this before > begging more of your time on it. I'm hoping to pin it down a little more for > you, or at least provide an updated reproducer. > > Thanks again > Ralph > > On Jan 6, 2012, at 8:15 AM, Ralph Castain wrote: > >> >> On Jan 6, 2012, at 7:02 AM, Nick Mathewson wrote: >> >>> On Fri, Jan 6, 2012 at 7:24 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>> If it helps, I have now confirmed that I *can* activate the t2 event >>>> during the t1func callback in my example *provided* I called event_assign >>>> on it prior to entering event_base_loop. It is also okay for me to >>>> event_add the t2 event during the callback - I am simply not allowed to >>>> event_assign *and* activate it there. >>>> >>>> However, it is okay to assign the event during the callback so long as I >>>> don't activate it until after I return. >>>> >>>> Seems a little strange to me - is this the intended behavior? >>> >>> Well, no, of course not. >>> >>> Looking at your code, the only weird thing I see at first glance is >>> that you are calling event_add() on t1 and t2 -- you shouldn't be >>> doing that. event_add() is only for events that you want libevent to >>> poll or wait for, but waiting for EV_WRITE on fd -1 isn't >>> well-defined. If you want to activate them yourself with >>> event_active(), there's no need to event_add() them. >>> >>> That shouldn't be causing this problem, though, I think. (Unless it is?) >> >> BINGO! Indeed, event_add was the source of the trouble. My bad for not >> understanding when event_add was required. >> >>> >>> I just tried your test programs, though, and they worked okay for me >>> on OSX and linux, using Libevent 2.0.13-stable and Libevent >>> 2.0.14-stable. >>> >>> What platform are you running your tests on? Have you tried other >>> platforms too? Does the outcome depend on which libevent backend is >>> in use? Have you tried this with an unpatched Libevent, just to >>> confirm that it's not introduced by any openmpi patches? >> >> FWICT, it is a corruption issue, and so it does indeed depend on platform >> and backend - just a question of what memory location gets trounced. >> >> FWIW, I was conducting my tests on OSX and linux as well, using OMPI with >> 2.0.13 underneath. I think the difference in our results is due to the >> location issue - I suspect that you might also hit a problem if we continued >> chaining events long enough, but I haven't confirmed it. >> >> Also fwiw: the OMPI changes are confined to configuration/Makefile areas - >> we actually don't fiddle with the libevent code itself other than a couple >> of places where we test for stdbool.h before including it. >> >> Thanks Nick! >> Ralph >> >>> >>> yrs, >>> -- >>> Nick >>> *********************************************************************** >>> To unsubscribe, send an e-mail to majord...@freehaven.net with >>> unsubscribe libevent-users in the body. >> > *********************************************************************** To unsubscribe, send an e-mail to majord...@freehaven.net with unsubscribe libevent-users in the body.