On Tue, Apr 18, 2017 at 03:06:08PM +0200, Willy Tarreau wrote:

> On Tue, Apr 18, 2017 at 03:53:29PM +0300, Slawa Olhovchenkov wrote:
> > now_ms posible wrap (by design).
> > As result delta_ms calcultaed as `TICKS_TO_MS(tick_remain(now_ms,
> > exp)) + 1` can be negative. This caused negative
> > timeout.tv_sec/timeout.tv_nsec, failed kevent() (EINVAL) and infinite
> > loop.
> 
> That's an interesting case that we've contemplated already a few weeks
> ago and didn't see how to make it fail. The only case where it could
> happen would be when there's a timeout larger than 24.3 days (2147483647
> ms to be precise) in the configuration so that a task is scheduled further
> in the future. But even with such an exact value in the config, the issue
> would last a single millisecond, as on the next millisecond the difference
> is reduced by one millisecond and doesn't overflow anymore.
> 
> We still need to keep it in mind as a corner case to reproduce the bug
> though.

Today I am got this issuse on all 3 haproxy different hosts
simultaneously (about 07:38:57 UTC I am got 'kernel: sonewconn: pcb
0xfffff80040f781d0: Listen queue overflow: 1537 already in queue
awaiting acceptance (564 occurrences)').

2017-04-18 07:38:57 UTC = 1492501137 unix time.
1492501137 unix time = 1640 now_ms, i.e. issues cused around now_ms
wraps.

Track down by dtrace show negative timeout.tv_sec/timeout.tv_nsec.

I am don't know how to add loging to ev_kqueue.c for print
now_ms/exp/other stuff.

# grep time /usr/local/etc/haproxy.conf | sort | uniq
        timeout client      100s
        timeout client  50000
        timeout client 50000ms
        timeout connect      100s
        timeout connect 5000
        timeout connect 500ms
        timeout queue   100s
        timeout server      100s
        timeout server  50000
        timeout server 2000ms
        timeout server 30s

Reply via email to