HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11.
Connections are getting closed during data transfer phase at random
sizes on backend. Sometimes just as little as 420 bytes get transferred,
but usually more is transferred before sudden end of connection. HAproxy
logs have connection closing status SD-- when this happens.
Basic components of system look like this:
Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin
Our HTTP server part is compiling data from chunks it gets from local
cache. When it receives request from client via HAproxy, it sends
response header, then fetches chunks, compiles those and sends data
client.
SD-- happens more frequently when connection between benchmarking tool
and HAproxy is fast, e.g. when doing tests where client side is not
loaded much. Happens much more for http than for https.
For example:
httpress -t1 -c10 -n1000 URL (rarely or not at all)
250 requests launched
500 requests launched
750 requests launched
1000 requests launched
TOTALS: 1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real
concurrency
TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes,
452000 overhead
TIMING: 241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time
httpress -t10 -c10 -n1000 URL (happens frequently)
2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection
closed
2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection
closed
2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection
closed
2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection
closed
2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection
closed
2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection
closed
2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection
closed
1000 requests launched
2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection
closed
thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784
bytes, 32544 overhead
thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784
bytes, 32544 overhead
thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784
bytes, 32544 overhead
thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674
bytes, 30284 overhead
thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832
bytes, 25312 overhead
thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296
bytes, 30736 overhead
thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296
bytes, 30736 overhead
thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038
bytes, 13108 overhead
thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406
bytes, 32996 overhead
thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258
bytes, 17628 overhead
TOTALS: 1000 connect, 1000 requests, 616 success, 384 fail, 10 (10)
real concurrency
TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes,
278432 overhead
TIMING: 170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time
Because of thread count differences, -t1 (one thread) test is much more
loaded on client side than it is with -t10 (ten threads).
Random samples from HAproxy log (proper size of the object in HAproxy
logs is 667960042 bytes for that test file).
0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0
0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0
Our HTTP server logs contain hard unrecoverable errors about unable to
write to socket when HAproxy closes connection:
Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec
Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986
msec
Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825
msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273
msec
Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075
msec
Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146
msec
Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937
msec
Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110
msec
Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 979
msec
Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1087
msec
Return Code: 32. Transferred 139962654 out of 667959622 Bytes in 918
msec
Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 977
msec
Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1079
msec
Return Code: 32. Transferred 177068702 out of 667959622 Bytes in 1060
msec
Return Code: 32. Transferred 119149343 out of 667959622 Bytes in 881
msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1283
msec
Return Code: 32. Transferred 147466569 out of 667959622 Bytes in 975
msec
Return Code: 32. Transferred 202724001 out of 667959622 Bytes in 1132
msec
Return Code: 32. Transferred 161645919 out of 667959622 Bytes in 1020
msec
Return Code: 32. Transferred 179952286 out of 667959622 Bytes in 1196
msec
Return Code: 32. Transferred 188359337 out of 667959622 Bytes in 1075
msec
Return Code: 32. Transferred 152241337 out of 667959622 Bytes in 1024
msec
Return Code: 32. Transferred 183069174 out of 667959622 Bytes in 1061
msec
Return Code: 32. Transferred 257841674 out of 667959622 Bytes in 1237
msec
Return Code: 32. Transferred 158994217 out of 667959622 Bytes in 1002
msec
Return Code: 32. Transferred 161645919 out of 667959622 Bytes in 898
msec
Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1032
msec
Return Code: 32. Transferred 251720931 out of 667959622 Bytes in 1276
msec
Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 886
msec
Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 909
msec
Return Code: 32. Transferred 165923538 out of 667959622 Bytes in 964
msec
Return Code: 32. Transferred 236802864 out of 667959622 Bytes in 1179
msec
Return Code: 32. Transferred 179952286 out of 667959622 Bytes in 1062
msec
Return Code: 32. Transferred 196575613 out of 667959622 Bytes in 1074
msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1313
msec
Return Code: 32. Transferred 83133350 out of 667959622 Bytes in 742 msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1271
msec
Return Code: 32. Transferred 101926434 out of 667959622 Bytes in 866
msec
Return Code: 32. Transferred 243340203 out of 667959622 Bytes in 1082
msec
Return Code: 32. Transferred 183069174 out of 667959622 Bytes in 1068
msec
Return Code: 32. Transferred 208193755 out of 667959622 Bytes in 1141
msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1291
msec
Return Code: 32. Transferred 90596755 out of 667959622 Bytes in 723 msec
Return Code: 32. Transferred 179952286 out of 667959622 Bytes in 1056
msec
Test run directly against HTTP server do not return any errors in any
combination of test load, that's why we are suspecting that bug is in
HAproxy.
Could it be that HAproxy is somehow ending wrong connection or counts
bytes transferred improperly?
Any help debugging this would be highly appreciated.
$ haproxy -vv
HA-Proxy version 1.9.3 2019/01/29 - https://haproxy.org/
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing
-Wdeclaration-after-statement -fwrapv -Wno-unused-label
-Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration
-Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers
-Wtype-limits
OPTIONS = USE_ZLIB=1 USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1
USE_STATIC_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents =
200
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.5
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto'
keyword)
h2 : mode=HTX side=FE|BE
h2 : mode=HTTP side=FE
<default> : mode=HTX side=FE|BE
<default> : mode=TCP|HTTP side=FE|BE
Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace
Thanks,
Veiko