Hi Pieter, On Mon, Oct 12, 2015 at 01:22:48AM +0200, PiBa-NL wrote: > >>>>>>#1 0x0000000000417388 in buffer_slow_realign (buf=0x7d3c90) at > >>>>>>src/buffer.c:166 > >>>>>> block1 = -3306 > >>>>>> block2 = 0
I'm puzzled by this above, no block should have a negative size. > >>>>>>#2 0x0000000000480c42 in http_wait_for_request (s=0x80247d600, > >>>>>>req=0x80247d610, an_bit=4) > >>>>>> at src/proto_http.c:2686 > >>>>>> cur_idx = -6336 > >>>>>> sess = (struct session *) 0x80241e400 > >>>>>> txn = (struct http_txn *) 0x802bb2140 > >>>>>> msg = (struct http_msg *) 0x802bb21a0 > >>>>>> ctx = {line = 0x2711079 <Address 0x2711079 out of > >>>>>>bounds>, idx = 3, val = 0, vlen = 7, tws = 0, del = 33, prev = 0} And this above, similarly cur_idx shouldn't be negative. > >Seems that buffer_slow_realign() isn't used regularly during normal > >haproxy operation, and it crashes first time that specific function > >gets called. > >Reproduction is pretty consistent with chrome browser refreshing stats > >every second. > >Then starting: wrk -c 200 -t 2 -d 10 http://127.0.0.1:801/ > >I tried adding some Alert(); items in the code to see what parameters > >are set at what step, but am not understanding the exact internals of > >that code.. > > > >This negative bh=-7800 is not supposed to be there i think? It is from > >one of the dprintf statements, how are those supposed generate output?.. > >[891069718] http_wait_for_request: stream=0x80247d600 b=0x80247d610, > >exp(r,w)=0,0 bf=00c08200 bh=-7800 analysers=34 > > > >Anything else i can check or provide to help get this fixed? > > > >Best regards, > >PiBa-NL > Just a little 'bump' to this issue.. > > Anyone know when/how this buffer_slow_realign() is suppose to work? Yes, it's supposed to be used only when a request or response is wrapped in the request or response buffer. It uses memcpy(), hence the "slow" aspect of the realign. > I suspect it either contains a bug, or is called with bogus parameters.. It's very sensitive to the consistency of the buffer being realigned. So errors such as buf->i + buf->o > buf->size, or buf->p > buf->data + buf->size, or buf->p < buf->data etc... can lead to crashes. But these must never happen at all otherwise it proves that there's a bug somewhere else. Here since block1 is -3306 and block2 = 0, I suspect that they were assigned at line 159 from buf->i, which definitely means that the buffer was already corrupted. > How can we/i determine which it is? The difficulty consists in finding what can lead to a corrupted buffer :-/ In the past we had such issues when trying to forward more data than was available in the buffer, due to option send-name-header. I wouldn't be surprized that it can happen here on corner cases when building a message from Lua if the various message pointers are not all perfectly correct. > Even though with a small change in the config (adding a backend) i cant > reproduce it, that doesnt mean there isn't a problem with the fuction.. > As the whole function doesn't seem to get called in that circumstance.. It could be related to an uninitialized variable somewhere as well. You can try to start haproxy with "-dM" to see if it makes the issues 100% reproducible or not. This poisons all buffers (fills them with a constant byte 0x50 after malloc) so that we don't rely on an uninitialized zero byte somewhere. Regards, Willy