Hi guys, On Thu, Aug 04, 2016 at 07:02:50PM +0200, Lukas Tribus wrote: > > It may be very useful to build with libslz instead of building without > > zlib. It would stress the exact same code paths in haproxy, you would > > still get compression and we'd see if the issue can be reproduced. > > While googling around I found another report [3], a similar/same crash in > memcpy() while using zlib. Apparently switching to libslz fixed the issue > for them.
Now I have found :-) Thanks to the 3 reports we have accumulated, I noticed that we were each time flushing the pending data. And the flush function doesn't set next_in nor avail_in before calling deflate(), probably because it doesn't seem necessary (given the low crash rate I'd argue it rarely is). But when reading the zlib code, it was obvious to me that these ones could be used, to the point that I could force the crash by setting them to some junk. The reason why we didn't get this in 1.5 is that during 1.5 a buffer was assigned to the same session during all its life, so by not setting the values there, we in fact used to leave the earlier ones in place pointing to a valid location. Since we introduced dynamic buffers in 1.6, this is no longer true and buffers can vanish at any moment. Thus I'd encourage those still facing the bug to apply the attached patch and to report their experience. It works on both 1.7 and 1.6. Regards, Willy
>From d8b8b5329e0892b9f0c832f0e377dbe8e9b32aba Mon Sep 17 00:00:00 2001 From: Willy Tarreau <[email protected]> Date: Mon, 8 Aug 2016 16:41:01 +0200 Subject: BUG/MAJOR: compression: initialize avail_in/next_in even during flush For quite some time, a few users have been experiencing random crashes when compressing with zlib, from versions 1.2.3 to 1.2.8 included. Upon thourough investigation in zlib's deflate.c, it appeared obvious that avail_in and next_in are used during the flush operation and need to be initialized, while admittedly it's not obvious in the documentation. By simply forcing both values to -1 it's possible to immediately reproduce the exact crash that these users have been experiencing : (gdb) bt #0 0x00007fa73ce10c00 in __memcpy_sse2 () from /lib64/libc.so.6 #1 0x00007fa73e0c5d49 in ?? () from /lib64/libz.so.1 #2 0x00007fa73e0c68e0 in ?? () from /lib64/libz.so.1 #3 0x00007fa73e0c73c7 in deflate () from /lib64/libz.so.1 #4 0x00000000004dca1c in deflate_flush_or_finish (comp_ctx=0x7b6580, out=0x7fa73e5bd010, flag=2) at src/compression.c:808 #5 0x00000000004dcb60 in deflate_flush (comp_ctx=0x7b6580, out=0x7fa73e5bd010) at src/compression.c:835 #6 0x00000000004dbc50 in http_compression_buffer_end (s=0x7c0050, in=0x7c00a8, out=0x78adf0 <tmpbuf.24662>, end=0) at src/compression.c:249 #7 0x000000000048bb5f in http_response_forward_body (s=0x7c0050, res=0x7c00a0, an_bit=1048576) at src/proto_http.c:7173 #8 0x00000000004cce54 in process_stream (t=0x7bffd8) at src/stream.c:1939 #9 0x0000000000427ddf in process_runnable_tasks () at src/task.c:238 #10 0x0000000000419892 in run_poll_loop () at src/haproxy.c:1573 #11 0x000000000041a4a5 in main (argc=4, argv=0x7fffcda38348) at src/haproxy.c:1933 Note that for all reports deflate_flush_or_finish() was always involved. The crash is very hard to reproduce when using regular traffic because it requires that the combination of avail_in and next_in are inadequate so that the memcpy() call reads out of bounds. But this can very likely happen when the input buffer points to an area reused by another stream when the flush has been interrupted due to a full output buffer. This also explains why this report is recent, as dynamic buffer allocation was introduced in 1.6. Anyway it's not acceptable to call a function with a randomly set input buffer. The deflate() function explicitly checks for the case where both avail_in and next_in are null and doesn't use it in this case during a flush, so this is the best solution. Special thanks to Sasha Litvak, James Hartshorn and Paul Bauer for reporting very useful stack traces which were critical to finding the root cause of this bug. This fix must be backported into 1.6 and 1.5, though 1.5 is less likely to trigger this case given that it keeps its own buffers allocated all along the session's life. --- src/compression.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/compression.c b/src/compression.c index 8d09585..4e13bae 100644 --- a/src/compression.c +++ b/src/compression.c @@ -565,6 +565,8 @@ static int deflate_flush_or_finish(struct comp_ctx *comp_ctx, struct buffer *out int out_len = 0; z_stream *strm = &comp_ctx->strm; + strm->next_in = NULL; + strm->avail_in = 0; strm->next_out = (unsigned char *)bi_end(out); strm->avail_out = out->size - buffer_len(out); -- 1.7.12.1

