Re: core dump, lua service, 1.6-dev6 ss-20150930

Willy Tarreau Sun, 11 Oct 2015 22:29:29 -0700

Hi Pieter,

On Mon, Oct 12, 2015 at 01:22:48AM +0200, PiBa-NL wrote:
> >>>>>>#1  0x0000000000417388 in buffer_slow_realign (buf=0x7d3c90) at
> >>>>>>src/buffer.c:166
> >>>>>>           block1 = -3306
> >>>>>>           block2 = 0


I'm puzzled by this above, no block should have a negative size.

> >>>>>>#2  0x0000000000480c42 in http_wait_for_request (s=0x80247d600,
> >>>>>>req=0x80247d610, an_bit=4)
> >>>>>>       at src/proto_http.c:2686
> >>>>>>           cur_idx = -6336
> >>>>>>           sess = (struct session *) 0x80241e400
> >>>>>>           txn = (struct http_txn *) 0x802bb2140
> >>>>>>           msg = (struct http_msg *) 0x802bb21a0
> >>>>>>           ctx = {line = 0x2711079 <Address 0x2711079 out of 
> >>>>>>bounds>, idx = 3, val = 0, vlen = 7, tws = 0, del = 33, prev = 0}

And this above, similarly cur_idx shouldn't be negative.

> >Seems that buffer_slow_realign() isn't used regularly during normal 
> >haproxy operation, and it crashes first time that specific function 
> >gets called.
> >Reproduction is pretty consistent with chrome browser refreshing stats 
> >every second.
> >Then starting: wrk -c 200 -t 2 -d 10 http://127.0.0.1:801/
> >I tried adding some Alert(); items in the code to see what parameters 
> >are set at what step, but am not understanding the exact internals of 
> >that code..
> >
> >This negative bh=-7800 is not supposed to be there i think? It is from 
> >one of the dprintf statements, how are those supposed generate output?..
> >[891069718] http_wait_for_request: stream=0x80247d600 b=0x80247d610, 
> >exp(r,w)=0,0 bf=00c08200 bh=-7800 analysers=34
> >
> >Anything else i can check or provide to help get this fixed?
> >
> >Best regards,
> >PiBa-NL
> Just a little 'bump' to this issue..
> 
> Anyone know when/how this buffer_slow_realign() is suppose to work?

Yes, it's supposed to be used only when a request or response is wrapped
in the request or response buffer. It uses memcpy(), hence the "slow"
aspect of the realign.

> I suspect it either contains a bug, or is called with bogus parameters..

It's very sensitive to the consistency of the buffer being realigned. So
errors such as buf->i + buf->o > buf->size, or buf->p > buf->data + buf->size,
or buf->p < buf->data etc... can lead to crashes. But these must never happen
at all otherwise it proves that there's a bug somewhere else.

Here since block1 is -3306 and block2 = 0, I suspect that they were assigned
at line 159 from buf->i, which definitely means that the buffer was already
corrupted.

> How can we/i determine which it is?

The difficulty consists in finding what can lead to a corrupted buffer :-/
In the past we had such issues when trying to forward more data than was
available in the buffer, due to option send-name-header. I wouldn't be
surprized that it can happen here on corner cases when building a message
from Lua if the various message pointers are not all perfectly correct.

> Even though with a small change in the config (adding a backend) i cant 
> reproduce it, that doesnt mean there isn't a problem with the fuction.. 
> As the whole function doesn't seem to get called in that circumstance..

It could be related to an uninitialized variable somewhere as well. You
can try to start haproxy with "-dM" to see if it makes the issues 100%
reproducible or not. This poisons all buffers (fills them with a constant
byte 0x50 after malloc) so that we don't rely on an uninitialized zero byte
somewhere.

Regards,
Willy

Re: core dump, lua service, 1.6-dev6 ss-20150930

Reply via email to