Hi,

Le 2014-07-09 14:25, Willy Tarreau a écrit :
Hi guys,

On Wed, Jul 09, 2014 at 07:39:27PM +0200, Lukas Tribus wrote:
> (gdb) run
> Starting program: /usr/bin/sudo haproxy -f /etc/haproxy/haproxy.cfg -p
> /var/run/haproxy.pid -sf 18904
> process 18911 is executing new program: /usr/local/sbin/haproxy-1.5.1
> [WARNING] 189/125516 (18911) : Setting tune.ssl.default-dh-param to
> 1024 by default, if your workload permits it you should set it to at
> least 2048. Please set a value>= 1024 to make this warning disappear.
>
> Program received signal SIGILL, Illegal instruction.
> 0x0808d9d6 in http_end_txn_clean_session (s=0x90e9e10) at
> src/proto_http.c:4835
> 4835 s->logs.bytes_in -= s->req->buf->i;
> (gdb) backtrace
> #0 0x0808d9d6 in http_end_txn_clean_session (s=0x90e9e10) at
> src/proto_http.c:4835
> #1 0x0808e0c7 in http_resync_states (s=0x90e9e10) at src/proto_http.c:5276
> #2 0x080910b8 in http_response_forward_body (s=0x90e9e10,
> res=0x90feaa0, an_bit=1048576) at src/proto_http.c:6645
> #3 0x080b05a1 in process_session (t=0x90c5cf8) at src/session.c:2053
> #4 0x080568a8 in process_runnable_tasks (next=0xbfcbbf9c) at src/task.c:238
> #5 0x0804c903 in run_poll_loop () at src/haproxy.c:1304
> #6 0x0804ef75 in main (argc=7, argv=0xbfcbc194) at src/haproxy.c:1638

The bug is in haproxy itself, I thought we may see something related to
either ssl or pcre. There is no post v1.5.1 crashfix in git, so this
is probably a new bug and its also where my knowledge ends.


Willy? This one here is crashing in 1.5.1, but did not crash in -dev26.

I don't think it's really a bug. The crash is an illegal instruction, which could be a lot of other things. It can very rarely be due to a bug because
the only way to jump to an area with bad code is to slightly corrupt a
function pointer (or the return stack pointer) and be unlucky enough for
the new pointer to still be inside a text section and point to invalid
opcodes. Note also that the text section cannot be modified by an overflow. On recent x86 cpus, invalid opcodes are very rare, and the nature of the code makes it quickly self-realign because of its variable length and the amount of single-byte instructions. I'm not saying it's not possible, I'm
just saying that the probabilty is low.

Among the things I'm used to see causing illegal instructions, I can cite overclocked CPUs and defect CPU caches. But if it does not happen in -dev26
nor when changing some options, I'd start by ruling that out.

Manfred made a good comment about CPU=native, especially since Merton said that compiling with 'make CFLAGS="-g -O0" TARGET=linux2628 USE_OPENSSL=1' made it not to crash. Merton, are you running on the machine you build on ? If not, you should not set the CPU, it will use "generic" which basically
is compatible with various CPUs of your architecture.

Also, please look below :

*(gdb) backtrace full*
#0  0x0808d9d6 in http_end_txn_clean_session (s=0x90e9e10) at
src/proto_http.c:4835
        prev_status = 200

The relevant code does this :

        /* don't count other requests' data */
        s->logs.bytes_in  -= s->req->buf->i;
        s->logs.bytes_out -= s->rep->buf->i;

There's not even any function call nor whatever to accidentely call invalid code, so we're crashing on code emitted by the compiler. Thus, either the compiler produces code which this CPU cannot run, or the CPU (or memory) is defective. I'd be tempted to consider the former first since ECC tends
to get rid of the latter on recent systems.

So the next step should be to rebuild without specifying "CPU=native" and see what happens. If it fixes the issue, it would be nice to post the two
executables so that we can compare them.

Willy

I've wanted to add my 2¢. Recently, I've compiled some software (not HAProxy however), using CPU=native as a make option, in a virtual machine on a Citrix XenServer 6.2. Although the software was able to compile successfully, it would fail on execution with a SIGILL. Recompiling the same piece of software with CPU=core2 (or CPU=generic) yielded a fully functional binary, so it could be possible that the CPU instruction set is not properly detected by the compiler (especially in virtualized environments).

Sincerely,
---
Edwin

Reply via email to