On Jan 13, 2012, at 8:29 AM, Nick Mathewson wrote: > On Fri, Jan 13, 2012 at 10:13 AM, Ralph Castain <r...@open-mpi.org> wrote: > >>> What kind of illegal value are you seeing, >> >> 1326467251, 774650 > > Okay, that looks like it's the actual current time! I wonder why that > would make select() give an error, though. Maybe because the current > time plus that many seconds exceeds a 32-bit TIME_MAX ?
Best I can tell, that is correct - select thinks that is an offset, and the result is too large. > >>> coming from where? >> >> I'm not sure who calls "select_dispatch" - the value is passed into it. > > The line is > res = evsel->dispatch(base, tv_p); > in event_base_loop() in event.c > >>> Are you >>> using the common_timeout code? >> >> This is just flowing thru from a call to event_loop - I'm not sure of the >> progression that takes us down to select_dispatch. > > I meant, is any part of your code calling > event_base_init_common_timeout() ? It sounds like "no". Nope > > So, three possibilities come to mind: > 1) Something is calling event_add with an absolute time rather than > a number of seconds/usec to delay. > 2) Something in Libevent is calling event_add_internal with an > absolute time rather than a delay, and is not setting the > tv_is_absolute flag > 3) timeout_correct has gone crazy, and thinks that the current time > has been reset to 0 for some reason. > > Adding some assertions in event_add_internal might track this down. > Trivially, you could do > if (tv && !tv_is_absolute) { > /* waiting one billion seconds should be enough for anyone */ > EVUTIL_ASSERT(tv->tv_sec < 1000000000); > } > > to try to detect 1 and 2. Interesting. The above code never tripped, so I dug a little further and found that event_add_internal is never being called with a tv value that is large. I did find it to be a race condition - sometimes the code completes and exits before I get the error condition report. The timeout value clearly isn't a garbage value - I dumped the values out, compared to current time as of the error: warn] select: Invalid argument TV OUT OF SPEC AT CNT 2: value 1326472513:976848 curtime 1326472513:977043 Ralph [warn] select: Invalid argument TV OUT OF SPEC AT CNT 3: value 1326472513:977327 curtime 1326472513:977413 So the value is getting updated and appears valid. What's strange is why libevent is passing an absolute time to select as it is supposed to be a relative value per the man page: If timeout is a non-nil pointer, it specifies a maximum interval to wait for the selection to complete. If timeout is a nil pointer, the select blocks indefinitely. To effect a poll, the timeout argument should be non-nil, pointing to a zero-valued timeval structure. Timeout is not changed by select(), and may be reused on subsequent calls, however it is good style to re-initialize it before each invocation of select(). Any easy way I can output an identifier that would tell us something about which event is involved? I see that I'm not getting output from the event_debug calls in the code, even though I've configured with debug enabled and called: event_enable_debug_mode(); event_set_debug_output(1); Anything else required to get that output? Would it help? > > -- > Nick > *********************************************************************** > To unsubscribe, send an e-mail to majord...@freehaven.net with > unsubscribe libevent-users in the body. *********************************************************************** To unsubscribe, send an e-mail to majord...@freehaven.net with unsubscribe libevent-users in the body.