I am seeing the following very odd crash and I am unsure whether libevent (the 1.4.4 currently integrated into the NetBSD source tree), my application, or the NetBSD kernel is to blame.
What I am seeing is this: if I have enough clients active on my application at once, eventually it dumps core with a SEGV in libevent:kqueue.c:kq_dispatch. This is unsurprising, because it's trying to dereference 'ev->ev_events' (at line 250 in my kqueue.c file): if (!ev->ev_events & EV_PERSIST)) Now, 'ev' has been set to events[i].udata in the loop just above (starting at line 213): for (i = 0; i < res; i++) { and "res" was set by this call to kevent: res = kevent(kqop->kq, changes, kqop->nchanges, events, kqop->nevents, ts_p); What is happening is that kevent is returning 10 into res, but it appears that events[8] and events[9] are being filled in with 0. So ev->ev_events for index 8 is a null pointer dereference: boom. Is it valid for kevent to return 10, but only fill in 8 entries in events? kqop->nevents is 64; kqop->nchanges is 0. Or does this mean libevent fed two entries worth of zeroed-out udata to the kernel, which has obligingly returned the same? How could I have caused this in my application -- or is it necessarily a bug in libevent? If I look at events[8] and events[9], not just the udata is 0 -- all members of the structure are 0. Should the kernel ever return events like these? If I go up a few stack frames and print out the active *base (I only use one event base) I get this: $27 = {evsel = 0xfbfd3638, evbase = 0xfbb07040, event_count = 68, event_count_active = 8, event_gotterm = 0, event_break = 0, activequeues = 0xfbb10098, nactivequeues = 1, sig = {signalqueue = {tqh_first = 0x0, tqh_last = 0xfbb011a0}, ev_signal = {ev_next = {tqe_next = 0x0, tqe_prev = 0x0}, ev_active_next = { tqe_next = 0x0, tqe_prev = 0x0}, ev_signal_next = {tqe_next = 0x0, tqe_prev = 0x0}, min_heap_idx = 0, ev_base = 0x0, ev_fd = 0, ev_events = 0, ev_ncalls = 0, ev_pncalls = 0x0, ev_timeout = {tv_sec = 0, tv_usec = 0}, ev_pri = 0, ev_callback = 0, ev_arg = 0x0, ev_res = 0, ev_flags = 0}, ev_signal_pair = {-1, -1}, ev_signal_added = 0, evsignal_caught = 0, evsigcaught = {0 <repeats 64 times>}, sh_old = 0x0, sh_old_max = 0}, eventqueue = {tqh_first = 0x805ec20, tqh_last = 0xfbb4b6d0}, event_tv = { tv_sec = 1243003336, tv_usec = 438615}, timeheap = {p = 0x0, n = 0, a = 0}, tv_cache = {tv_sec = 0, tv_usec = 438615}} To my very limited understanding, this looks OK -- and I note that event_count_active in here is 8, which matches the number actually returned as nonzero by kevent! What's gotten out of sync here and how? In other runs I see a much higher value for event_count. I do not think I am leaking events (I event_del everything when I tear down connections, and at this point I never have more than 16 file descriptors open, with a maximum of two live events each) so I wonder what this could mean and what kinds of values are normal in there, too. Can I safely make kq_dispatch ignore events that are all zeroed out? Am I doing something else obviously wrong? Is libevent, or the NetBSD kernel? I don't see any relevant-looking changes in the ChangeLog for libevent since the 1.4.4 that's in NetBSD. Thor _______________________________________________ Libevent-users mailing list Libevent-users@monkey.org http://monkeymail.org/mailman/listinfo/libevent-users