Hi John,
On Thu, May 08, 2014 at 09:15:20AM +0200, John-Paul Bader wrote:
> Hey,
>
> so I have downloaded the haproxy-ss-Latest from the website and applied
> your patches. I have compiled it with:
>
> make TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
>
> It ran very good for 2 hours but then 6 out of 12 processes coredumped,
> this time however in the haproxy code realm and apparently session related:
Great, so good news and bad news at the same time. Good news being that
the shared context was definitely causing the trouble, the bad news being
that the slightly-tested kqueue is still having some trouble.
> Maybe the full backtrace is more helpful:
>
> (gdb) bt full
> #0 kill_mini_session (s=0x804269c00) at src/session.c:299
> level = 6
> conn = (struct connection *) 0x0
> err_msg = <value optimized out>
> #1 0x0000000000463928 in conn_session_complete (conn=0x8039f2a80) at
> src/session.c:355
> s = (struct session *) 0x804269c00
> #2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
> src/connection.c:88
> conn = <value optimized out>
> flags = 41997063
> #3 0x00000000004127db in fd_process_polled_events (fd=<value optimized
> out>) at src/fd.c:271
> new_updt = <value optimized out>
> old_updt = 1
> #4 0x000000000046ed85 in _do_poll (p=<value optimized out>, exp=<value
> optimized out>)
> at src/ev_kqueue.c:141
> status = 1
> count = 0
> fd = <value optimized out>
> delta_ms = <value optimized out>
> timeout = {tv_sec = 0, tv_nsec = 27000000}
> updt_idx = <value optimized out>
> en = <value optimized out>
> eo = <value optimized out>
> changes = <value optimized out>
(...)
OK this trace tends to show that we were called for an event on
an FD in a strange state, half-open, half-closed :-/
If the fd were closed, in conn_fd_handler() it would have simply
returned because .owner == NULL. Since it managed to go to
conn_session_complete(), it means that fd.owner was correct but
the connection was not attached to this session, quite strange.
It seems like something was deinitialized, but I can't find a
code sequence which could produce this.
Would you please retry with poll ? I'm not really convinced that
a bug in kqueue could cause this :-/
Thanks
Willy