HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11.

Connections are getting closed during data transfer phase at random sizes on backend. Sometimes just as little as 420 bytes get transferred, but usually more is transferred before sudden end of connection. HAproxy logs have connection closing status SD-- when this happens.

Basic components of system look like this:
Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin

Our HTTP server part is compiling data from chunks it gets from local cache. When it receives request from client via HAproxy, it sends response header, then fetches chunks, compiles those and sends data client.

SD-- happens more frequently when connection between benchmarking tool and HAproxy is fast, e.g. when doing tests where client side is not loaded much. Happens much more for http than for https.

For example:

httpress -t1 -c10 -n1000 URL (rarely or not at all)
250 requests launched
500 requests launched
750 requests launched
1000 requests launched

TOTALS: 1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real concurrency TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes, 452000 overhead
TIMING:  241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time

httpress -t10 -c10 -n1000 URL (happens frequently)

2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed
1000 requests launched
2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection closed thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 32544 overhead thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784 bytes, 32544 overhead thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 32544 overhead thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674 bytes, 30284 overhead thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832 bytes, 25312 overhead thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296 bytes, 30736 overhead thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296 bytes, 30736 overhead thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038 bytes, 13108 overhead thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406 bytes, 32996 overhead thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258 bytes, 17628 overhead

TOTALS: 1000 connect, 1000 requests, 616 success, 384 fail, 10 (10) real concurrency TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes, 278432 overhead
TIMING:  170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time

Because of thread count differences, -t1 (one thread) test is much more loaded on client side than it is with -t10 (ten threads).

Random samples from HAproxy log (proper size of the object in HAproxy logs is 667960042 bytes for that test file).
0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0
0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0

Our HTTP server logs contain hard unrecoverable errors about unable to write to socket when HAproxy closes connection:
Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec
Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986 msec Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825 msec Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273 msec Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075 msec Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146 msec Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937 msec Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110 msec Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 979 msec Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1087 msec Return Code: 32. Transferred 139962654 out of 667959622 Bytes in 918 msec Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 977 msec Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1079 msec Return Code: 32. Transferred 177068702 out of 667959622 Bytes in 1060 msec Return Code: 32. Transferred 119149343 out of 667959622 Bytes in 881 msec Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1283 msec Return Code: 32. Transferred 147466569 out of 667959622 Bytes in 975 msec Return Code: 32. Transferred 202724001 out of 667959622 Bytes in 1132 msec Return Code: 32. Transferred 161645919 out of 667959622 Bytes in 1020 msec Return Code: 32. Transferred 179952286 out of 667959622 Bytes in 1196 msec Return Code: 32. Transferred 188359337 out of 667959622 Bytes in 1075 msec Return Code: 32. Transferred 152241337 out of 667959622 Bytes in 1024 msec Return Code: 32. Transferred 183069174 out of 667959622 Bytes in 1061 msec Return Code: 32. Transferred 257841674 out of 667959622 Bytes in 1237 msec Return Code: 32. Transferred 158994217 out of 667959622 Bytes in 1002 msec Return Code: 32. Transferred 161645919 out of 667959622 Bytes in 898 msec Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1032 msec Return Code: 32. Transferred 251720931 out of 667959622 Bytes in 1276 msec Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 886 msec Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 909 msec Return Code: 32. Transferred 165923538 out of 667959622 Bytes in 964 msec Return Code: 32. Transferred 236802864 out of 667959622 Bytes in 1179 msec Return Code: 32. Transferred 179952286 out of 667959622 Bytes in 1062 msec Return Code: 32. Transferred 196575613 out of 667959622 Bytes in 1074 msec Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1313 msec
Return Code: 32. Transferred 83133350 out of 667959622 Bytes in 742 msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1271 msec Return Code: 32. Transferred 101926434 out of 667959622 Bytes in 866 msec Return Code: 32. Transferred 243340203 out of 667959622 Bytes in 1082 msec Return Code: 32. Transferred 183069174 out of 667959622 Bytes in 1068 msec Return Code: 32. Transferred 208193755 out of 667959622 Bytes in 1141 msec Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1291 msec
Return Code: 32. Transferred 90596755 out of 667959622 Bytes in 723 msec
Return Code: 32. Transferred 179952286 out of 667959622 Bytes in 1056 msec

Test run directly against HTTP server do not return any errors in any combination of test load, that's why we are suspecting that bug is in HAproxy.

Could it be that HAproxy is somehow ending wrong connection or counts bytes transferred improperly?

Any help debugging this would be highly appreciated.

$ haproxy -vv
HA-Proxy version 1.9.3 2019/01/29 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits OPTIONS = USE_ZLIB=1 USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.5
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE
              h2 : mode=HTTP       side=FE
       <default> : mode=HTX        side=FE|BE
       <default> : mode=TCP|HTTP   side=FE|BE

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

Thanks,
Veiko

Reply via email to