I am seeing the following very odd crash and I am unsure whether libevent
(the 1.4.4 currently integrated into the NetBSD source tree), my application,
or the NetBSD kernel is to blame.

What I am seeing is this: if I have enough clients active on my
application at once, eventually it dumps core with a SEGV in
libevent:kqueue.c:kq_dispatch.  This is unsurprising, because it's
trying to dereference 'ev->ev_events' (at line 250 in my kqueue.c file):

        if (!ev->ev_events & EV_PERSIST))

Now, 'ev' has been set to events[i].udata in the loop just above (starting
at line 213):
        for (i = 0; i < res; i++) {

and "res" was set by this call to kevent:

        res = kevent(kqop->kq, changes, kqop->nchanges,
                     events, kqop->nevents, ts_p);

What is happening is that kevent is returning 10 into res, but it
appears that events[8] and events[9] are being filled in with 0.  So
ev->ev_events for index 8 is a null pointer dereference: boom.

Is it valid for kevent to return 10, but only fill in 8 entries in
events?  kqop->nevents is 64; kqop->nchanges is 0.

Or does this mean libevent fed two entries worth of zeroed-out udata
to the kernel, which has obligingly returned the same?  How could I
have caused this in my application -- or is it necessarily a bug in
libevent?

If I look at events[8] and events[9], not just the udata is 0 -- all
members of the structure are 0.  Should the kernel ever return events
like these?

If I go up a few stack frames and print out the active *base (I only
use one event base) I get this:

$27 = {evsel = 0xfbfd3638, evbase = 0xfbb07040, event_count = 68, 
event_count_active = 8, event_gotterm = 0, event_break = 0, activequeues = 
0xfbb10098, nactivequeues = 1, sig = {signalqueue = {tqh_first = 0x0, tqh_last 
= 0xfbb011a0}, ev_signal = {ev_next = {tqe_next = 0x0, tqe_prev = 0x0}, 
ev_active_next = { tqe_next = 0x0, tqe_prev = 0x0}, ev_signal_next = {tqe_next 
= 0x0, tqe_prev = 0x0}, min_heap_idx = 0, ev_base = 0x0, ev_fd = 0, ev_events = 
0, ev_ncalls = 0, ev_pncalls = 0x0, ev_timeout = {tv_sec = 0, tv_usec = 0}, 
ev_pri = 0, ev_callback = 0, ev_arg = 0x0, ev_res = 0, ev_flags = 0}, 
ev_signal_pair = {-1, -1}, ev_signal_added = 0, evsignal_caught = 0, 
evsigcaught = {0 <repeats 64 times>}, sh_old = 0x0, sh_old_max = 0}, eventqueue 
= {tqh_first = 0x805ec20, tqh_last = 0xfbb4b6d0}, event_tv = { tv_sec = 
1243003336, tv_usec = 438615}, timeheap = {p = 0x0, n = 0, a = 0}, tv_cache = 
{tv_sec = 0, tv_usec = 438615}}

To my very limited understanding, this looks OK -- and I note that
event_count_active in here is 8, which matches the number actually returned
as nonzero by kevent!  What's gotten out of sync here and how?

In other runs I see a much higher value for event_count.  I do not think I
am leaking events (I event_del everything when I tear down connections, and
at this point I never have more than 16 file descriptors open, with a
maximum of two live events each) so I wonder what this could mean and
what kinds of values are normal in there, too.

Can I safely make kq_dispatch ignore events that are all zeroed out?  Am
I doing something else obviously wrong?  Is libevent, or the NetBSD kernel?
I don't see any relevant-looking changes in the ChangeLog for libevent
since the 1.4.4 that's in NetBSD.

Thor
_______________________________________________
Libevent-users mailing list
Libevent-users@monkey.org
http://monkeymail.org/mailman/listinfo/libevent-users

Reply via email to