Hello,

first off - thank you very much for such a wonderful piece of software! I'm using HAProxy on Linux/x86 and it's been a wonderful experience so far.

Things are a bit different on z/Linux, using s390 architecture. HAProxy was running very nicely for several hours and then suddenly started to segfault and hang after I had to restart it.

Here's some background information:

$ uname -a
Linux vlbt12 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:58 EDT 2008 s390x s390x s390x GNU/Linux
$

$ ./src/haproxy-1.4.2/haproxy -vv
HA-Proxy version 1.4.2 2010/03/17
Copyright 2000-2010 Willy Tarreau <w...@1wt.eu>

Build options :
  TARGET  = linux26
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g
  OPTIONS =

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
     sepoll : pref=400,  test result OK
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

$

Configuration file:

global
    log 127.0.0.1:514 local0 debug

defaults
    log global
    option httpclose
    timeout connect 5000
    timeout client 5000
    timeout server 5000

backend http-plain-bck
    mode http
    balance roundrobin
    server httpplain01 127.0.0.1:7081 check inter 2s rise 2 fall 2
    server httpplain02 127.0.0.1:7082 check inter 2s rise 2 fall 2
    option httpchk GET /http-check


frontend plain
    bind 0.0.0.0:37080
    mode http
    maxconn 300
    default_backend http-plain-bck
    option httplog


The client application is SoapUI http://soapui.com - they're using an embedded Jakarta Commons HTTP client 3.0.1 underneath. The servers are IBM's proprietary Message Brokers. I'm sending HTTP POST data which is then proxied over two backends.

I've read http://www.formilux.org/archives/haproxy/0910/2562.html but I think my case is a bit different as I have no problems with starting HAProxy. And unfortunately I'm not able to use Valgrind as it seems not to be supported on that architecture.

When the 'hang' happens it's just that - the client waits for several dozens of seconds and I'm able to send the SIGQUIT signal, here's the output from HAProxy upon receiving it:

$ ./src/haproxy-1.4.2/haproxy -V -f ./problematic.conf
Available polling systems :
     sepoll : pref=400,  test result OK
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.
Using sepoll() as the polling mechanism.

Dumping pools usage.
  - Pool pipe (32 bytes) : 0 allocated (0 bytes), 0 used, 2 users [SHARED]
- Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED] - Pool task (144 bytes) : 6 allocated (864 bytes), -19793676 used, 1 users [SHARED] - Pool hdr_idx (832 bytes) : 4 allocated (3328 bytes), -19793678 used, 2 users [SHARED] - Pool requri (1024 bytes) : 3 allocated (3072 bytes), 2 used, 1 users [SHARED] - Pool session (1344 bytes) : 3 allocated (4032 bytes), -19793678 used, 1 users [SHARED] - Pool buffer (16512 bytes) : 8 allocated (132096 bytes), -39587356 used, 1 users [SHARED]
Total: 7 pools, 143392 bytes allocated, 9081850944 used.

I'm not sure how to interpret it but the numbers look a bit odd, don't they? 143392 bytes have been allocated yet 9081850944 have been used?

And here's what syslog shows just before I send SIGQUIT

Mar 19 19:29:22 localhost haproxy[22945]: 172.150.21.62:4447 [19/Mar/2010:19:29:22.448] plain http-plain-bck/httpplain02 0/0/0/0/1 404 428 - - ---- 2/2/2/1/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 19:29:22 localhost haproxy[22945]: 172.150.21.62:4448 [19/Mar/2010:19:29:22.449] plain http-plain-bck/httpplain01 0/0/0/0/1 404 428 - - ---- 1/1/1/0/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 19:29:22 localhost haproxy[22945]: 172.150.21.62:4449 [19/Mar/2010:19:29:22.480] plain http-plain-bck/httpplain02 0/0/0/0/1 404 428 - - ---- 2/2/2/1/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 19:29:22 localhost haproxy[22945]: 172.150.21.62:4450 [19/Mar/2010:19:29:22.481] plain http-plain-bck/httpplain01 0/0/0/0/1 404 428 - - ---- 1/1/1/0/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 19:29:22 localhost haproxy[22945]: 172.150.21.62:4453 [19/Mar/2010:19:29:22.494] plain plain/<NOSRV> -1/-1/-1/-1/0 400 187 - - CR-- 3/3/0/0/0 0/0 "<BADREQ>" Mar 19 19:29:22 localhost haproxy[22945]: 172.150.21.62:4452 [19/Mar/2010:19:29:22.489] plain http-plain-bck/httpplain01 0/0/0/31/32 404 428 - - ---- -18798/-18798/2/0/0 0/0 "POST /customer/profile HTTP/1.1"

HTTP 404 response code is perfectly fine in this case.

But it's not always going into a hung state, sometimes it simply segfaults after processing a couple of requests - here's what gets logged in syslog when it happens:

Mar 19 20:01:34 localhost haproxy[23143]: Proxy http-plain-bck started.
Mar 19 20:01:34 localhost haproxy[23143]: Proxy plain started.
Mar 19 20:20:55 localhost haproxy[23241]: Proxy http-plain-bck started.
Mar 19 20:20:55 localhost haproxy[23241]: Proxy plain started.
Mar 19 20:20:59 localhost haproxy[23241]: 172.150.21.62:4624 [19/Mar/2010:20:20:59.301] plain http-plain-bck/httpplain01 0/0/0/1/2 404 428 - - ---- 2/2/1/0/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 20:20:59 localhost haproxy[23241]: 172.150.21.62:4625 [19/Mar/2010:20:20:59.302] plain http-plain-bck/httpplain02 1/0/0/0/2 404 428 - - ---- 1/1/1/0/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 20:20:59 localhost haproxy[23241]: 172.150.21.62:4626 [19/Mar/2010:20:20:59.303] plain http-plain-bck/httpplain01 0/0/0/0/2 404 428 - - ---- 0/0/0/0/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 20:20:59 localhost haproxy[23241]: 172.150.21.62:4627 [19/Mar/2010:20:20:59.309] plain http-plain-bck/httpplain02 1/0/0/0/2 404 428 - - ---- 2/2/1/0/0 0/0 "POST /customer/profile HTTP/1.1" Mar 19 20:20:59 localhost haproxy[23241]: 172.150.21.62:4628 [19/Mar/2010:20:20:59.310] plain http-plain-bck/httpplain01 0/0/0/0/1 404 428 - - ---- 1/1/1/0/0 0/0 "POST /customer/profile HTTP/1.1"


I have attached some basic data from a gdb session. I'd love to give more information but I'm not sure what I should do next, can you please guide me a bit here? What should I take a look at now?

Cheers!

--
Dariusz Suchojad
$ gdb ./src/haproxy-1.4.2/haproxy
GNU gdb Red Hat Linux (6.5-37.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "s390x-redhat-linux-gnu"...Using host libthread_db 
library "/lib64/libthread_db.so.1".

(gdb) run -V -f ./problematic.conf
Starting program: /home/dsuch/haproxy/src/haproxy-1.4.2/haproxy -V -f 
./problematic.conf
Available polling systems :
     sepoll : pref=400,  test result OK
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.
Using sepoll() as the polling mechanism.

Program received signal SIGSEGV, Segmentation fault.
session_free (s=0x800c4210) at src/session.c:101
101             LIST_DEL(&s->list);
(gdb) bt
#0  session_free (s=0x800c4210) at src/session.c:101
#1  0x000000008003c7ce in process_session (t=0x800c3970) at src/session.c:1632
#2  0x0000000080009274 in process_runnable_tasks (next=0x3ffffbdd084) at 
src/task.c:234
#3  0x0000000080002c9a in run_poll_loop () at src/haproxy.c:966
#4  0x0000000080004712 in main (argc=<value optimized out>, argv=0x3ffffbdd378) 
at src/haproxy.c:1240
(gdb) list
96                      LIST_INIT(&bref->users);
97                      if (s->list.n != &sessions)
98                              LIST_ADDQ(&LIST_ELEM(s->list.n, struct session 
*, list)->back_refs, &bref->users);
99                      bref->ref = s->list.n;
100             }
101             LIST_DEL(&s->list);
102             pool_free2(pool2_session, s);
103
104             /* We may want to free the maximum amount of pools if the proxy 
is stopping */
105             if (fe && unlikely(fe->state == PR_STSTOPPED)) {
(gdb) print s
$1 = (struct session *) 0x800c4210
(gdb) print *s
$2 = {list = {n = 0x0, p = 0x8007c760}, back_refs = {n = 0x800c4220, p = 
0x800c4220}, task = 0x800c3970, listener = 0x80098460, fe = 0x8008f7c0,
  be = 0x8007e420, conn_retries = 3, flags = 14, term_trace = 0, req = 
0x800c4ab0, rep = 0x800c8b40, si = {{state = 9, prev_state = 7, owner = 
0x800c3970,
      fd = 9, flags = 16, exp = 0, update = 0x8002afc4 
<stream_sock_data_finish>, shutr = 0x8002adfc <stream_sock_shutr>,
      shutw = 0x8002aea0 <stream_sock_shutw>, chk_rcv = 0x8002aaf4 
<stream_sock_chk_rcv>, chk_snd = 0x8002b9d0 <stream_sock_chk_snd>, connect = 0,
      iohandler = 0, ib = 0x800c4ab0, ob = 0x800c8b40, err_type = 0, err_loc = 
0x0, private = 0x0, st0 = 0, st1 = 0}, {state = 9, prev_state = 7,
      owner = 0x800c3970, fd = 13, flags = 16, exp = 0, update = 0x8002afc4 
<stream_sock_data_finish>, shutr = 0x8002adfc <stream_sock_shutr>,
      shutw = 0x8002aea0 <stream_sock_shutw>, chk_rcv = 0x8002aaf4 
<stream_sock_chk_rcv>, chk_snd = 0x8002b9d0 <stream_sock_chk_snd>,
      connect = 0x800385f8 <tcpv4_connect_server>, iohandler = 0, ib = 
0x800c8b40, ob = 0x800c4ab0, err_type = 0, err_loc = 0x0, private = 0x0, st0 = 
0,
      st1 = 0}}, cli_addr = {ss_family = 2, __ss_align = 0,
    __ss_padding = 
"\000\000\000\000\000\000\000\t\000\000\000\000\200\004â�...@\000\000\000\000\200\000▒0\000\000\003▒▒▒▒▒\000\000\000\000\000\000\001▒\000\000\000\000\200\bt\000\000\000\000\000\200\a▒`\000\000\000\000\200\004▒▒\000\000\000\000\000\000\000\t\000\000\000\000\000\000\000\002\000\000\000\000\000\000?a",
 '\0' <repeats 19 times>, "L▒.▒p"}, frt_addr = {ss_family = 0, __ss_align = 
0, __ss_padding = '\0' <repeats 111 times>}, srv_addr = {sin_family = 2,
    sin_port = 7082, sin_addr = {s_addr = 2130706433}, sin_zero = 
"\000\000\000\000\000\000\000"}, srv = 0x8008b410, srv_conn = 0x0, prev_srv = 
0x0,
  pend_pos = 0x0, txn = {req = {msg_state = 36, col = 219, sov = 219, eoh = 217,
      sol = 0x800c4b30 "POST /customer/profile HTTP/1.1\r\nConnection: 
close\r\nContent-Type: text/xml;charset=UTF-8\r\nSOAPAction: 
get_customer_profile\r\nUser-Agent: Jakarta Commons-HttpClient/3.0.1\r\nHost: 
10.152.61.39:37080\r\nCont"...,
      eol = 0x800c4c07 "\r\n\r\n<soapenv:Envelope 
xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\"; 
xmlns:cus=\"http://foobar/customer\";>\n   <soapenv:Header/>\n   
<soapenv:Body>\n      <cus:get_customer_profile_request>\n    "..., som = 219, 
err_pos = -2, sl = {rq = {l = 31, m_l = 4, u = 5, u_l = 17,
          v = 23, v_l = 8}, st = {l = 31, v_l = 4, c = 5, c_l = 17, r = 23, r_l 
= 8}}, hdr_content_len = 0, cap = 0x0}, hdr_idx = {v = 0x800c4760,
      size = 205, used = 6, last = 6, tail = 5}, flags = 457555968, meth = 
HTTP_METH_POST, status = 404, rsp = {msg_state = 36, col = 169, sov = 169,
      eoh = 167,
      sol = 0x800c8bc0 "HTTP/1.1 404 Not Found\r\nContent-Type: 
text/html;charset=utf-8\r\nContent-Length: 259\r\nDate: Fri, 19 Mar 2010 
18:39:14 GMT\r\nServer: Apache-Coyote/1.1\r\nConnection: 
close\r\n\r\n<html><head><title>WebSphere Me"...,
      eol = 0x800c8c65 "\r\n\r\n<html><head><title>WebSphere Message Brokers 
error report</title></head>\r\n<body><h1>HTTP Status 404 - Resource Not 
Found</h1>\r\nURI /customer/profile does not map to any message flow in broker 
AWTG"..., som = 169, err_pos = -2, sl = {rq = {l = 22, m_l = 8, u = 9, u_l = 3,
          v = 13, v_l = 9}, st = {l = 22, v_l = 8, c = 9, c_l = 3, r = 13, r_l 
= 9}}, hdr_content_len = 0, cap = 0x0}, uri = 0x0, cli_cookie = 0x0,
    srv_cookie = 0x0, sessid = 0x0, auth = {method = 0, method_data = {str = 
0x0, size = 0, len = 0}, user = 0x0, pass = 0x0}}, store = {{ts = 0x0,
      table = 0x0, flags = 0}, {ts = 0x0, table = 0x0, flags = 0}, {ts = 0x0, 
table = 0x0, flags = 0}, {ts = 0x0, table = 0x0, flags = 0}, {ts = 0x0,
      table = 0x0, flags = 0}, {ts = 0x0, table = 0x0, flags = 0}, {ts = 0x0, 
table = 0x0, flags = 0}, {ts = 0x0, table = 0x0, flags = 0}}, store_count = 0,
  logs = {logwait = 0, accept_date = {tv_sec = 1269023954, tv_usec = 275790}, 
tv_accept = {tv_sec = 1269023954, tv_usec = 275790}, tv_request = {
      tv_sec = 1269023954, tv_usec = 276275}, t_queue = 0, t_connect = 0, 
t_data = 1, t_close = 2, srv_queue_size = 0, prx_queue_size = 0, bytes_in = 536,
    bytes_out = 428}, do_log = 0x80024cdc <http_sess_log>, srv_error = 
0x8002363c <http_return_srv_error>, data_source = 0, data_state = 0, data_ctx = 
{
    stats = {px = 0x0, sv = 0x0, l = 0x0, px_st = 0, sv_st = 0, flags = 0, iid 
= 0, type = 0, sid = 0}, sess = {bref = {users = {n = 0x0, p = 0x0},
        ref = 0x0}, target = 0x0, uid = 0, section = 0, pos = 0}, errors = {iid 
= 0, px = 0x0, buf = 0, sid = 0, ptr = 0, bol = 0}, cli = {msg = 0x0}},
  uniq_id = 0}
(gdb)

Reply via email to