Hi,
Le 2014-07-09 14:25, Willy Tarreau a écrit :
Hi guys,
On Wed, Jul 09, 2014 at 07:39:27PM +0200, Lukas Tribus wrote:
> (gdb) run
> Starting program: /usr/bin/sudo haproxy -f /etc/haproxy/haproxy.cfg -p
> /var/run/haproxy.pid -sf 18904
> process 18911 is executing new program: /usr/local/sbin/haproxy-1.5.1
> [WARNING] 189/125516 (18911) : Setting tune.ssl.default-dh-param to
> 1024 by default, if your workload permits it you should set it to at
> least 2048. Please set a value>= 1024 to make this warning disappear.
>
> Program received signal SIGILL, Illegal instruction.
> 0x0808d9d6 in http_end_txn_clean_session (s=0x90e9e10) at
> src/proto_http.c:4835
> 4835 s->logs.bytes_in -= s->req->buf->i;
> (gdb) backtrace
> #0 0x0808d9d6 in http_end_txn_clean_session (s=0x90e9e10) at
> src/proto_http.c:4835
> #1 0x0808e0c7 in http_resync_states (s=0x90e9e10) at src/proto_http.c:5276
> #2 0x080910b8 in http_response_forward_body (s=0x90e9e10,
> res=0x90feaa0, an_bit=1048576) at src/proto_http.c:6645
> #3 0x080b05a1 in process_session (t=0x90c5cf8) at src/session.c:2053
> #4 0x080568a8 in process_runnable_tasks (next=0xbfcbbf9c) at src/task.c:238
> #5 0x0804c903 in run_poll_loop () at src/haproxy.c:1304
> #6 0x0804ef75 in main (argc=7, argv=0xbfcbc194) at src/haproxy.c:1638
The bug is in haproxy itself, I thought we may see something related
to
either ssl or pcre. There is no post v1.5.1 crashfix in git, so this
is probably a new bug and its also where my knowledge ends.
Willy? This one here is crashing in 1.5.1, but did not crash in
-dev26.
I don't think it's really a bug. The crash is an illegal instruction,
which
could be a lot of other things. It can very rarely be due to a bug
because
the only way to jump to an area with bad code is to slightly corrupt a
function pointer (or the return stack pointer) and be unlucky enough
for
the new pointer to still be inside a text section and point to invalid
opcodes. Note also that the text section cannot be modified by an
overflow.
On recent x86 cpus, invalid opcodes are very rare, and the nature of
the
code makes it quickly self-realign because of its variable length and
the
amount of single-byte instructions. I'm not saying it's not possible,
I'm
just saying that the probabilty is low.
Among the things I'm used to see causing illegal instructions, I can
cite
overclocked CPUs and defect CPU caches. But if it does not happen in
-dev26
nor when changing some options, I'd start by ruling that out.
Manfred made a good comment about CPU=native, especially since Merton
said
that compiling with 'make CFLAGS="-g -O0" TARGET=linux2628
USE_OPENSSL=1'
made it not to crash. Merton, are you running on the machine you build
on ?
If not, you should not set the CPU, it will use "generic" which
basically
is compatible with various CPUs of your architecture.
Also, please look below :
*(gdb) backtrace full*
#0 0x0808d9d6 in http_end_txn_clean_session (s=0x90e9e10) at
src/proto_http.c:4835
prev_status = 200
The relevant code does this :
/* don't count other requests' data */
s->logs.bytes_in -= s->req->buf->i;
s->logs.bytes_out -= s->rep->buf->i;
There's not even any function call nor whatever to accidentely call
invalid
code, so we're crashing on code emitted by the compiler. Thus, either
the
compiler produces code which this CPU cannot run, or the CPU (or
memory)
is defective. I'd be tempted to consider the former first since ECC
tends
to get rid of the latter on recent systems.
So the next step should be to rebuild without specifying "CPU=native"
and
see what happens. If it fixes the issue, it would be nice to post the
two
executables so that we can compare them.
Willy
I've wanted to add my 2¢. Recently, I've compiled some software (not
HAProxy however), using CPU=native as a make option, in a virtual
machine on a Citrix XenServer 6.2. Although the software was able to
compile successfully, it would fail on execution with a SIGILL.
Recompiling the same piece of software with CPU=core2 (or CPU=generic)
yielded a fully functional binary, so it could be possible that the CPU
instruction set is not properly detected by the compiler (especially in
virtualized environments).
Sincerely,
---
Edwin