On Tue, Apr 18, 2017 at 03:06:08PM +0200, Willy Tarreau wrote:
> On Tue, Apr 18, 2017 at 03:53:29PM +0300, Slawa Olhovchenkov wrote:
> > now_ms posible wrap (by design).
> > As result delta_ms calcultaed as `TICKS_TO_MS(tick_remain(now_ms,
> > exp)) + 1` can be negative. This caused negative
> > timeout.tv_sec/timeout.tv_nsec, failed kevent() (EINVAL) and infinite
> > loop.
>
> That's an interesting case that we've contemplated already a few weeks
> ago and didn't see how to make it fail. The only case where it could
> happen would be when there's a timeout larger than 24.3 days (2147483647
> ms to be precise) in the configuration so that a task is scheduled further
> in the future. But even with such an exact value in the config, the issue
> would last a single millisecond, as on the next millisecond the difference
> is reduced by one millisecond and doesn't overflow anymore.
>
> We still need to keep it in mind as a corner case to reproduce the bug
> though.
Today I am got this issuse on all 3 haproxy different hosts
simultaneously (about 07:38:57 UTC I am got 'kernel: sonewconn: pcb
0xfffff80040f781d0: Listen queue overflow: 1537 already in queue
awaiting acceptance (564 occurrences)').
2017-04-18 07:38:57 UTC = 1492501137 unix time.
1492501137 unix time = 1640 now_ms, i.e. issues cused around now_ms
wraps.
Track down by dtrace show negative timeout.tv_sec/timeout.tv_nsec.
I am don't know how to add loging to ev_kqueue.c for print
now_ms/exp/other stuff.
# grep time /usr/local/etc/haproxy.conf | sort | uniq
timeout client 100s
timeout client 50000
timeout client 50000ms
timeout connect 100s
timeout connect 5000
timeout connect 500ms
timeout queue 100s
timeout server 100s
timeout server 50000
timeout server 2000ms
timeout server 30s