On 08/27/2018 10:46 PM, PiBa-NL wrote:
Hi Frederic, Oliver,

Hi Pieter,

Thanks for your investigations :).
I've made a little reg-test (files attached). Its probably not 'correct' to commit as-is, but should be enough to get a reproduction.. I hope..

changing it to nbthread 1 makes it work every time..(that i tried)

Your script is correct. Thank you a lot for this Pieter.

The test actually seems to show a variety of issues.
## Every once in a while it takes like 7 seconds to run a test.. During which cpu usage is high..

Sounds like the first issue you reported. You can use -t varnistest option to set a large timeout so that you might have enough time to kill varnistest (Ctrl+C) to prevent it to kill haproxy. Then you can attach gdb to the haproxy process.

     ---- c0    7.6 HTTP rx timeout (fd:5 7500 ms)

## But most of the time, it just doesn't finish with a correct result (ive seen haproxy do core dumps also while testing..). There is of course the option that i did something wrong in the lua as well...

Does the test itself work for you guys? (with nbthread 1)

I have not managed to make this script fails with "nbthread 1".

I have also seen coredumps with "nbthread 3" even with only one HTTP request from c0 client:


    client c0 -connect ${h1_fe1_sock} {
        txreq -url "/"
        rxresp
        expect resp.status == 200
    }

If you run varnishtest with -l option, it leaves the temporary vtc.* directory if the test failed.

If you set your environment to produce coredumps (ulimit -c unlimited) you should find coredump files in /tmp/vtc.*/<haproxy instance name>/ directory (/tmp/vtc.*/h1/ in our case).

According to gdb we have an issue in src/ssl_sock.c. So I have CC this mail to Emeric:

Reading symbols from haproxy...done.
[New LWP 32432]
[New LWP 32431]
[New LWP 32428]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/flecaille/src/haproxy/haproxy -d -f /tmp/vtc.32410.6f80f987/h1/cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f78f98bba56 in ASN1_get_object ()
   from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
[Current thread is 1 (Thread 0x7f78f8522700 (LWP 32432))]
(gdb) bt full
#0  0x00007f78f98bba56 in ASN1_get_object ()
   from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#1 0x00007f78f98c2ff8 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#2 0x00007f78f98c41b5 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#3  0x00007f78f98c4ead in ASN1_item_ex_d2i ()
   from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007f78f98c4f2b in ASN1_item_d2i ()
   from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5  0x00007f78f9cdac98 in d2i_SSL_SESSION ()
   from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6 0x000055e2078be006 in ssl_sock_init (conn=0x7f78e8012220) at src/ssl_sock.c:4985 ptr = 0xf800000000000000 <error: Cannot access memory at address 0xf800000000000000>
        sess = <optimized out>
        may_retry = <optimized out>
        conn = 0x7f78e8012220
#7  0x000055e20797dfc1 in conn_xprt_init (conn=0x7f78e8012220)
    at include/proto/connection.h:84
        ret = 0
#8  tcp_connect_server (conn=0x7f78e8012220, data=0, delack=<optimized out>)
    at src/proto_tcp.c:545
        fd = 18
        srv = <optimized out>
        be = 0x55e207c567e0 <socket_proxy>
        src = <optimized out>
#9  0x000055e207981aba in si_connect (si=0x7f78e8017680)
    at include/proto/stream_interface.h:366
        ret = 0
#10 connect_server (s=s@entry=0x7f78e80173b0) at src/backend.c:1223
        cli_conn = 0x0
        srv_conn = 0x7f78e8012220
        srv_cs = <optimized out>
        old_cs = <optimized out>
        reuse = <optimized out>
        err = <optimized out>
#11 0x000055e207924295 in sess_update_stream_int (s=0x7f78e80173b0) at src/stream.c:885
        conn_err = <optimized out>
        si = 0x7f78e8017680
        req = 0x7f78e80173c0
#12 process_stream (t=<optimized out>, context=0x7f78e80173b0, state=<optimized out>)
    at src/stream.c:2240
        s = 0x7f78e80173b0
        sess = <optimized out>
        rqf_last = <optimized out>
        rpf_last = 2147483648
        rq_prod_last = <optimized out>
        rq_cons_last = <optimized out>
        rp_cons_last = <optimized out>
        rp_prod_last = <optimized out>
        req_ana_back = <optimized out>
        req = 0x7f78e80173c0
        res = 0x7f78e8017420
        si_f = 0x7f78e8017638
        si_b = 0x7f78e8017680
#13 0x000055e2079ab1f8 in process_runnable_tasks () at src/task.c:381
        t = <optimized out>
        state = <optimized out>
        ctx = <optimized out>
---Type <return> to continue, or q <return> to quit---
        process = <optimized out>
        t = <optimized out>
        max_processed = <optimized out>
#14 0x000055e207959c51 in run_poll_loop () at src/haproxy.c:2386
        next = <optimized out>
        exp = <optimized out>
#15 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2451
        ptif = <optimized out>
        ptdf = <optimized out>
        start_lock = 0
#16 0x00007f78f9f27494 in start_thread (arg=0x7f78f8522700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7f78f8522700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140157538936576,
-2788191698922495525, 0, 140727734888799, 0, 140157575090240, 2864195697895809499, 2864194293716885979}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
              canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#17 0x00007f78f9266acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals

Did i do something crazy in the lua code? , i do have several loops.. but i don't think thats where it 'hangs' ?..

Well, I am not a LUA expert ;). But even if wrong your code should not make haproxy crash. Let's continue to investigate.

Regards,

PiBa-NL (Pieter)



Reply via email to