Hi Pieter,

On Mon, Oct 12, 2015 at 10:29:05PM +0200, PiBa-NL wrote:
> Been running some more tests with the information that req->buf->i 
> should be >= 0.
> 
> What i find is that after 1 request i already see rqh=-103 , it seems 
> like the initial request size which in this case is also is 103 bytes is 
> subtracted twice? It does not immediately crash, but if this is already 
> a sign of 'corruption' then the cause should be a little more easy to 
> find..

Oh yes definitely, good catch!

> @Willy can you confirm this indicates the problem could be in progress 
> of heading to a crash? Even though in the last line it restores to 0..

Absolutely. Everytime we're trying to track such a painful bug, I end
up looking for initial symptoms, like this one. In general the first
corruption is minor and undetected but once it's seeded its disease,
the problem will definitely occur.

> See its full output below replaced the DPRINTF already in code of 
> stream.c with Alert..
> 
> Thanks in advance,
> PiBa-NL
> 
> root@OPNsense:/usr/ports/net/haproxy-devel # haproxy -f 
> /var/haproxy.config -d
> [ALERT] 277/063055 (61489) : SSLv3 support requested but unavailable.
> Available polling systems :
>      kqueue : pref=300,  test result OK
>        poll : pref=200,  test result OK
>      select : pref=150,  test result FAILED
> Total: 3 (2 usable), will use kqueue.
> Using kqueue() as the polling mechanism.
> 00000000:http_frt.accept(0006)=0008 from [127.0.0.1:62358]
> [ALERT] 277/063058 (61489) : [910446771] process_stream:1655: 
> task=0x80244b9b0 s=0x80247d600, sfl=0x00000080, rq=0x80247d610, 
> rp=0x80247d650, exp(r,w)=0,0 rqf=00d08000 rpf=80000000 rqh=0 rqt=0 rph=0 
> rpt=0 cs=7 ss=0, cet=0x0 set=0x0 retr=0
> [ALERT] 277/063058 (61489) : [910446771] http_wait_for_request: 
> stream=0x80247d600 b=0x80247d610, exp(r,w)=0,0 bf=00d08000 bh=0 
> analysers=34
> [ALERT] 277/063058 (61489) : [910446772] process_stream:1655: 
> task=0x80244b9b0 s=0x80247d600, sfl=0x00000080, rq=0x80247d610, 
> rp=0x80247d650, exp(r,w)=0,0 rqf=00000002 rpf=80000000 rqh=103 rqt=0 
> rph=0 rpt=0 cs=7 ss=0, cet=0x0 set=0x0 retr=0
> [ALERT] 277/063058 (61489) : [910446772] http_wait_for_request: 
> stream=0x80247d600 b=0x80247d610, exp(r,w)=0,0 bf=00808002 bh=103 
> analysers=34
> 00000000:http_frt.clireq[0008:ffffffff]: GET / HTTP/1.1
> 00000000:http_frt.clihdr[0008:ffffffff]: Host: 127.0.0.1:801
> 00000000:http_frt.clihdr[0008:ffffffff]: Accept: */*
> 00000000:http_frt.clihdr[0008:ffffffff]: User-Agent: fetch libfetch/2.0
> 00000000:http_frt.clihdr[0008:ffffffff]: Connection: close
> [ALERT] 277/063058 (61489) : [910446772] process_switching_rules: 
> stream=0x80247d600 b=0x80247d610, exp(r,w)=0,0 bf=00808002 bh=103 
> analysers=00
> [ALERT] 277/063058 (61489) : [910446772] sess_prepare_conn_req: 
> sess=0x80247d600 rq=0x80247d610, rp=0x80247d650, exp(r,w)=0,0 
> rqf=00808002 rpf=80000000 rqh=0 rqt=103 rph=0 rpt=0 cs=7 ss=1
> [ALERT] 277/063058 (61489) : [910446772] process_stream:1655: 
> task=0x80244b9b0 s=0x80247d600, sfl=0x0000048a, rq=0x80247d610, 
> rp=0x80247d650, exp(r,w)=910456772,0 rqf=00808200 rpf=80000023 rqh=0 
> rqt=0 rph=97 rpt=0 cs=7 ss=7, cet=0x0 set=0x0 retr=0
> 00000000:http_frt.srvrep[0008:ffffffff]: HTTP/1.1 200 OK
> 00000000:http_frt.srvhdr[0008:ffffffff]: content-length: 13
> 00000000:http_frt.srvhdr[0008:ffffffff]: content-type: text/plain
> 00000000:http_frt.srvhdr[0008:ffffffff]: Connection: close
> [ALERT] 277/063058 (61489) : [910446773] process_stream:1655: 
> task=0x80244b9b0 s=0x80247d600, sfl=0x0000048a, rq=0x80247d610, 
> rp=0x80247d650, exp(r,w)=910456773,0 rqf=00808200 rpf=8004a221 rqh=-103 
> rqt=0 rph=0 rpt=0 cs=7 ss=7, cet=0x0 set=0x0 retr=0

Yep so here rqh is in fact req->buf->i and as you noticed it's been
decremented a second time.

I'm seeing this which I find suspicious in hlua.c :

  5909
  5910                  /* skip the requests bytes. */
  5911                  bo_skip(si_oc(si), strm->txn->req.eoh + 2);

First I don't understand why "eoh+2", I suspect that it's for the CRLF
in which case it's wrong since it can be a lone LF. Second, I'm not
seeing sov being reset afterwards. Could you please just add this
after this line :

       strm->txn->req.next -= strm->txn->req.sov;
       strm->txn->req.sov = 0;

That's equivalent to what we're doing when dealing with a redirect (http.c:4258
if you're curious) since we also have to "eat" the request. There may be a few
other corner cases, the use-service mechanism is fairly new and puts its feet
in a place where things used to work just because they were trained to... But
it's a terribly powerful thing to have so we must fix it even if it needs a
few -stable cycles.

Thanks!
Willy


Reply via email to