On 08/28/2018 11:19 AM, Frederic Lecaille wrote:
On 08/27/2018 10:46 PM, PiBa-NL wrote:
Hi Frederic, Oliver,

Hi Pieter,

Thanks for your investigations :).
I've made a little reg-test (files attached). Its probably not 'correct' to commit as-is, but should be enough to get a reproduction.. I hope..

changing it to nbthread 1 makes it work every time..(that i tried)

Your script is correct. Thank you a lot for this Pieter.

The test actually seems to show a variety of issues.
## Every once in a while it takes like 7 seconds to run a test.. During which cpu usage is high..

Sounds like the first issue you reported. You can use -t varnistest option to set a large timeout so that you might have enough time to kill varnistest (Ctrl+C) to prevent it to kill haproxy. Then you can attach gdb to the haproxy process.

     ---- c0    7.6 HTTP rx timeout (fd:5 7500 ms)

## But most of the time, it just doesn't finish with a correct result (ive seen haproxy do core dumps also while testing..). There is of course the option that i did something wrong in the lua as well...

Does the test itself work for you guys? (with nbthread 1)

I have not managed to make this script fails with "nbthread 1".

I have also seen coredumps with "nbthread 3" even with only one HTTP request from c0 client:


     client c0 -connect ${h1_fe1_sock} {
         txreq -url "/"
         rxresp
         expect resp.status == 200
     }

If you run varnishtest with -l option, it leaves the temporary vtc.* directory if the test failed.

If you set your environment to produce coredumps (ulimit -c unlimited) you should find coredump files in /tmp/vtc.*/<haproxy instance name>/ directory (/tmp/vtc.*/h1/ in our case).

According to gdb we have an issue in src/ssl_sock.c. So I have CC this mail to Emeric:

Reading symbols from haproxy...done.
[New LWP 32432]
[New LWP 32431]
[New LWP 32428]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/flecaille/src/haproxy/haproxy -d -f /tmp/vtc.32410.6f80f987/h1/cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f78f98bba56 in ASN1_get_object ()
    from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
[Current thread is 1 (Thread 0x7f78f8522700 (LWP 32432))]
(gdb) bt full
#0  0x00007f78f98bba56 in ASN1_get_object ()
    from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#1  0x00007f78f98c2ff8 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#2  0x00007f78f98c41b5 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#3  0x00007f78f98c4ead in ASN1_item_ex_d2i ()
    from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007f78f98c4f2b in ASN1_item_d2i ()
    from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5  0x00007f78f9cdac98 in d2i_SSL_SESSION ()
    from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6  0x000055e2078be006 in ssl_sock_init (conn=0x7f78e8012220) at src/ssl_sock.c:4985         ptr = 0xf800000000000000 <error: Cannot access memory at address 0xf800000000000000>
         sess = <optimized out>
         may_retry = <optimized out>
         conn = 0x7f78e8012220
#7  0x000055e20797dfc1 in conn_xprt_init (conn=0x7f78e8012220)
     at include/proto/connection.h:84
         ret = 0
#8  tcp_connect_server (conn=0x7f78e8012220, data=0, delack=<optimized out>)
     at src/proto_tcp.c:545
         fd = 18
         srv = <optimized out>
         be = 0x55e207c567e0 <socket_proxy>
         src = <optimized out>
#9  0x000055e207981aba in si_connect (si=0x7f78e8017680)
     at include/proto/stream_interface.h:366
         ret = 0
#10 connect_server (s=s@entry=0x7f78e80173b0) at src/backend.c:1223
         cli_conn = 0x0
         srv_conn = 0x7f78e8012220
         srv_cs = <optimized out>
         old_cs = <optimized out>
         reuse = <optimized out>
         err = <optimized out>
#11 0x000055e207924295 in sess_update_stream_int (s=0x7f78e80173b0) at src/stream.c:885
         conn_err = <optimized out>
         si = 0x7f78e8017680
         req = 0x7f78e80173c0
#12 process_stream (t=<optimized out>, context=0x7f78e80173b0, state=<optimized out>)
     at src/stream.c:2240
         s = 0x7f78e80173b0
         sess = <optimized out>
         rqf_last = <optimized out>
         rpf_last = 2147483648
         rq_prod_last = <optimized out>
         rq_cons_last = <optimized out>
         rp_cons_last = <optimized out>
         rp_prod_last = <optimized out>
         req_ana_back = <optimized out>
         req = 0x7f78e80173c0
         res = 0x7f78e8017420
         si_f = 0x7f78e8017638
         si_b = 0x7f78e8017680
#13 0x000055e2079ab1f8 in process_runnable_tasks () at src/task.c:381
         t = <optimized out>
         state = <optimized out>
         ctx = <optimized out>
---Type <return> to continue, or q <return> to quit---
         process = <optimized out>
         t = <optimized out>
         max_processed = <optimized out>
#14 0x000055e207959c51 in run_poll_loop () at src/haproxy.c:2386
         next = <optimized out>
         exp = <optimized out>
#15 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2451
         ptif = <optimized out>
         ptdf = <optimized out>
         start_lock = 0
#16 0x00007f78f9f27494 in start_thread (arg=0x7f78f8522700) at pthread_create.c:333
         __res = <optimized out>
         pd = 0x7f78f8522700
         now = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140157538936576,
                -2788191698922495525, 0, 140727734888799, 0, 140157575090240,                 2864195697895809499, 2864194293716885979}, mask_was_saved = 0}},           priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
               canceltype = 0}}}
         not_first_call = <optimized out>
         pagesize_m1 = <optimized out>
         sp = <optimized out>
         freesize = <optimized out>
         __PRETTY_FUNCTION__ = "start_thread"
#17 0x00007f78f9266acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals

I am coming back to this issue first because it may lead to a memory corruption.

On my PC if I add traces in ssl_sock_prepare_srv_ctx(), when it is called by hlua_init(), global.nbthread equals to 1 (which is the default value of global nbthread value before parsing the configuration files), should be 3.

Note that we allocate exactly global.nbthread pointers to struct reused_sess in this function. When we enter later in ssl_sock_init(), we initialize ptr (above in the trace) like that:

const unsigned char *ptr = objt_server(conn->target)->ssl_ctx.reused_sess[tid].ptr

when tid equals to 0 or 1, haproxy does not crash. It crashes when tid equals to 2.

So, the bug here is that hlua_init() is called too soon, or at least ssl_sock_prepare_srv_ctx() for the LUA SSL server socket.

I hope this bug is in relation with other ones.

I am going to open another thread to post a patch.

Regards.




Reply via email to