On Jan 13, 2012, at 8:29 AM, Nick Mathewson wrote:

> On Fri, Jan 13, 2012 at 10:13 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>>> What kind of illegal value are you seeing,
>> 
>> 1326467251, 774650
> 
> Okay, that looks like it's the actual current time!  I wonder why that
> would make select() give an error, though.  Maybe because the current
> time plus that many seconds exceeds a 32-bit TIME_MAX ?

Best I can tell, that is correct - select thinks that is an offset, and the 
result is too large.

> 
>>> coming from where?
>> 
>> I'm not sure who calls "select_dispatch" - the value is passed into it.
> 
> The line is
>                 res = evsel->dispatch(base, tv_p);
> in event_base_loop() in event.c
> 
>>> Are you
>>> using the common_timeout code?
>> 
>> This is just flowing thru from a call to event_loop - I'm not sure of the 
>> progression that takes us down to select_dispatch.
> 
> I meant, is any part of your code calling
> event_base_init_common_timeout() ? It sounds like "no".

Nope

> 
> So, three possibilities come to mind:
>  1) Something is calling event_add with an absolute time rather than
> a number of seconds/usec to delay.
>  2) Something in Libevent is calling event_add_internal with an
> absolute time rather than a delay, and is not setting the
> tv_is_absolute flag
>  3) timeout_correct has gone crazy, and thinks that the current time
> has been reset to 0 for some reason.
> 
> Adding some assertions in event_add_internal might track this down.
> Trivially, you could do
>   if (tv && !tv_is_absolute) {
>       /* waiting one billion seconds should be enough for anyone */
>       EVUTIL_ASSERT(tv->tv_sec < 1000000000);
>   }
> 
> to try to detect 1 and 2.

Interesting. The above code never tripped, so I dug a little further and found 
that event_add_internal is never being called with a tv value that is large. I 
did find it to be a race condition - sometimes the  code completes and exits 
before I get the error condition report.

The timeout value clearly isn't a garbage value - I dumped the values out, 
compared to current time as of the error:

warn] select: Invalid argument
TV OUT OF SPEC AT CNT 2: value 1326472513:976848 curtime 1326472513:977043
Ralph
[warn] select: Invalid argument
TV OUT OF SPEC AT CNT 3: value 1326472513:977327 curtime 1326472513:977413

So the value is getting updated and appears valid. What's strange is why 
libevent is passing an absolute time to select as it is supposed to be a 
relative value per the man page:

    If timeout is a non-nil pointer, it specifies a maximum interval to wait 
for the selection to complete.  If
     timeout is a nil pointer, the select blocks indefinitely.  To effect a 
poll, the timeout argument should be
     non-nil, pointing to a zero-valued timeval structure.  Timeout is not 
changed by select(), and may be reused
     on subsequent calls, however it is good style to re-initialize it before 
each invocation of select().

Any easy way I can output an identifier that would tell us something about 
which event is involved? I see that I'm not getting output from the event_debug 
calls in the code, even though I've configured with debug enabled and called:

        event_enable_debug_mode();
        event_set_debug_output(1);

Anything else required to get that output? Would it help?

> 
> -- 
> Nick
> ***********************************************************************
> To unsubscribe, send an e-mail to majord...@freehaven.net with
> unsubscribe libevent-users    in the body.

***********************************************************************
To unsubscribe, send an e-mail to majord...@freehaven.net with
unsubscribe libevent-users    in the body.

Reply via email to