Willy,

I confirm that I still see the cD errors with 1.7.5.  I have been running
my environment which was about 80% 1.7.5 and 20% 1.7.3 for the past month
and did a final upgrade over the weekend.  To my surprise, virtually all
the hosts that were upgraded experienced the cD error later that day.

The upgrade process was straight forward, "apt-get dist-upgrade" followed
by a manual connection to test the link.

I'm at a loss of how to debug this further, can you offer any advice?

Here are the haproxy logs and the tcpdump details for one instance of the
problem.


$ haproxy -v
HA-Proxy version 1.7.5-2ppa1~trusty 2017/05/27
Copyright 2000-2017 Willy Tarreau <[email protected]>

apt log:
Start-Date: 2017-06-04  08:42:32
Commandline: apt-get dist-upgrade
Upgrade: haproxy:amd64 (1.7.3-1ppa1~trusty, 1.7.5-2ppa1~trusty)
End-Date: 2017-06-04  08:42:35


PID 1750 is version 1.7.3
PID 5090 is 1.7.5

logs:
Jun  3 23:04:44 app012 haproxy[1750]: 127.0.0.1:40017
[03/Jun/2017:23:03:40.146] ws-local servers/server1 1/66/64226 4498 10830
-- 0/0/0/0/0 0/0
Jun  4 00:37:49 app012 haproxy[1750]: 127.0.0.1:40106
[04/Jun/2017:00:36:56.128] ws-local servers/server1 1/59/53749 4505 11885
-- 0/0/0/0/0 0/0
Jun  4 08:42:35 app012 haproxy[5089]: Proxy ws-local started.
Jun  4 08:42:35 app012 haproxy[5089]: Proxy servers started.
Jun  4 08:42:35 app012 haproxy[1750]: Stopping frontend ws-local in 0 ms.
Jun  4 08:42:35 app012 haproxy[1750]: Stopping backend servers in 0 ms.
Jun  4 08:42:35 app012 haproxy[1750]: Proxy ws-local stopped (FE: 356
conns, BE: 0 conns).
Jun  4 08:42:35 app012 haproxy[1750]: Proxy servers stopped (FE: 0 conns,
BE: 356 conns).
Jun  4 08:42:35 app012 haproxy[5090]: Health check for server
servers/server1 failed, reason: Layer4 connection problem, info:
"Connection refused", check duration: 0ms, status: 0/1 DOWN.
Jun  4 08:42:35 app012 haproxy[5090]: Server servers/server1 is DOWN. 1
active and 0 backup servers left. 0 sessions active, 0 requeued, 0
remaining in queue.
Jun  4 08:42:35 app012 haproxy[5090]: servers/server1 changed its IP from
127.0.0.1 to 99.99.99.99 by mydns1/dns2.
Jun  4 08:42:40 app012 haproxy[5090]: Health check for server
servers/server2 failed, reason: Layer4 connection problem, info:
"Connection refused", check duration: 0ms, status: 0/1 DOWN.
Jun  4 08:42:40 app012 haproxy[5090]: Server servers/server2 is DOWN. 0
active and 0 backup servers left. 0 sessions active, 0 requeued, 0
remaining in queue.
Jun  4 08:42:40 app012 haproxy[5090]: backend servers has no server
available!
Jun  4 08:42:40 app012 haproxy[5090]: servers/server2 changed its IP from
127.0.0.1 to 99.99.99.99 by mydns2/dns3.
Jun  4 08:42:45 app012 haproxy[5090]: Health check for server
servers/server1 succeeded, reason: Layer4 check passed, check duration:
68ms, status: 3/3 UP.
Jun  4 08:42:45 app012 haproxy[5090]: Server servers/server1 is UP. 1
active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun  4 08:42:50 app012 haproxy[5090]: Health check for server
servers/server2 succeeded, reason: Layer4 check passed, check duration:
216ms, status: 3/3 UP.
Jun  4 08:42:50 app012 haproxy[5090]: Server servers/server2 is UP. 2
active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun  4 08:46:00 app012 haproxy[5090]: Health check for server
servers/server1 failed, reason: Health analyze, info: "Detected 1
consecutive errors, last one was: L4 unsuccessful connection", status: 2/3
UP.
Jun  4 08:46:01 app012 haproxy[5090]: 127.0.0.1:40606
[04/Jun/2017:08:46:00.271] ws-local servers/server2 502/221/935 2798 452 --
0/0/0/0/+1 0/0
Jun  4 08:46:10 app012 haproxy[5090]: Health check for server
servers/server1 succeeded, reason: Layer4 check passed, check duration:
64ms, status: 3/3 UP.
Jun  4 09:35:40 app012 haproxy[5090]: 127.0.0.1:40660
[04/Jun/2017:09:34:07.364] ws-local servers/server1 1/68/93519 4962 10825
-- 0/0/0/0/0 0/0
Jun  4 11:51:27 app012 haproxy[5090]: 127.0.0.1:40839
[04/Jun/2017:11:50:55.248] ws-local servers/server1 1/52/32022 5366 2220 --
0/0/0/0/0 0/0
Jun  4 13:52:55 app012 haproxy[5090]: 127.0.0.1:41005
[04/Jun/2017:13:52:04.501] ws-local servers/server1 1/66/51046 4520 7377 --
0/0/0/0/0 0/0
Jun  4 15:41:43 app012 haproxy[5090]: 127.0.0.1:41142
[04/Jun/2017:15:40:51.106] ws-local servers/server1 1/68/52078 4507 7658 --
0/0/0/0/0 0/0
Jun  4 16:58:23 app012 haproxy[5090]: 127.0.0.1:59601
[04/Jun/2017:16:58:22.757] ws-local servers/server1 1/62/782 6068 517 --
0/0/0/0/0 0/0
Jun  4 16:58:24 app012 haproxy[5090]: 127.0.0.1:51882
[04/Jun/2017:16:58:23.703] ws-local servers/server1 1/65/416 3059 553 --
1/1/1/1/0 0/0
Jun  4 16:58:29 app012 haproxy[5090]: 127.0.0.1:51569
[04/Jun/2017:16:58:29.192] ws-local servers/server1 1/56/397 6068 517 --
2/2/2/2/0 0/0
Jun  4 16:58:31 app012 haproxy[5090]: 127.0.0.1:56403
[04/Jun/2017:16:58:29.528] ws-local servers/server1 1/81/2261 2924 539 --
1/1/1/1/0 0/0
Jun  4 16:58:32 app012 haproxy[5090]: 127.0.0.1:46829
[04/Jun/2017:16:58:31.950] ws-local servers/server1 1/64/429 6364 529 --
1/1/1/1/0 0/0
Jun  4 16:59:24 app012 haproxy[5090]: 127.0.0.1:32783
[04/Jun/2017:16:58:24.064] ws-local servers/server1 1/56/60001 0 0 cD
0/0/0/0/0 0/0
Jun  4 20:17:17 app012 haproxy[5090]: 127.0.0.1:41497
[04/Jun/2017:20:16:07.931] ws-local servers/server1 1/71/69262 4454 10697
-- 0/0/0/0/0 0/0
Jun  4 22:00:32 app012 haproxy[5090]: 127.0.0.1:41608
[04/Jun/2017:21:59:27.620] ws-local servers/server1 1/67/64958 4484 9555 --
0/0/0/0/0 0/0
Jun  4 22:19:00 app012 haproxy[5090]: 127.0.0.1:41629
[04/Jun/2017:22:18:05.365] ws-local servers/server1 1/61/55581 4500 5720 --
0/0/0/0/0 0/0
Jun  5 09:20:58 app012 haproxy[5090]: 127.0.0.1:42308
[05/Jun/2017:09:20:08.785] ws-local servers/server1 1/53/49414 4515 8458 --
0/0/0/0/0 0/0
Jun  5 10:31:57 app012 haproxy[5090]: 127.0.0.1:42406
[05/Jun/2017:10:31:02.509] ws-local servers/server1 1/71/55080 4515 9186 --
0/0/0/0/0 0/0
Jun  5 10:46:28 app012 haproxy[5090]: 127.0.0.1:42432
[05/Jun/2017:10:45:40.011] ws-local servers/server1 1/65/48434 4485 6146 --
0/0/0/0/0 0/0
Jun  5 11:51:20 app012 haproxy[5090]: 127.0.0.1:42525
[05/Jun/2017:11:50:25.156] ws-local servers/server1 1/73/55703 4481 13552
-- 0/0/0/0/0 0/0
Jun  5 11:54:02 app012 haproxy[5090]: 127.0.0.1:42529
[05/Jun/2017:11:53:06.736] ws-local servers/server1 1/62/55864 7007 10573
-- 0/0/0/0/0 0/0
Jun  5 12:02:53 app012 haproxy[5090]: 127.0.0.1:42539
[05/Jun/2017:12:02:05.601] ws-local servers/server1 1/65/48225 4498 6249 --
0/0/0/0/0 0/0
Jun  5 12:24:31 app012 haproxy[5090]: 127.0.0.1:42572
[05/Jun/2017:12:23:35.267] ws-local servers/server1 1/54/56355 4491 7932 --
0/0/0/0/0 0/0
Jun  5 12:56:38 app012 haproxy[5090]: 127.0.0.1:42612
[05/Jun/2017:12:55:42.720] ws-local servers/server1 1/66/55318 4496 7587 --
0/0/0/0/0 0/0
Jun  5 13:02:28 app012 haproxy[5090]: 127.0.0.1:42629
[05/Jun/2017:13:01:33.449] ws-local servers/server1 1/53/55206 4509 8206 --
0/0/0/0/0 0/0
Jun  5 13:31:02 app012 haproxy[5090]: 127.0.0.1:42668
[05/Jun/2017:13:30:16.061] ws-local servers/server1 1/66/46680 4487 7111 --
0/0/0/0/0 0/0

A -> Local connection established
B -> HAPROXY connection established
C -> Local data sent to HAPROXY (but does not make it out)
D -> Remote server times out after 5 seconds
E -> Local connection closed by HAPROXY
F -> 60 seconds later HAPROXY closed connection with server
A 2017-06-04 16:58:24.064253 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags
[S], seq 4019382753, win 43690, options [mss 65495,sackOK,TS val 114809223
ecr 0,nop,wscale 7], length 0
A 2017-06-04 16:58:24.064312 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags
[S.], seq 452812023, ack 4019382754, win 43690, options [mss
65495,sackOK,TS val 114809223 ecr 114809223,nop,wscale 7], length 0
A 2017-06-04 16:58:24.064358 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags
[.], ack 1, win 342, options [nop,nop,TS val 114809223 ecr 114809223],
length 0
B 2017-06-04 16:58:24.064577 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags
[S], seq 1165186061, win 29200, options [mss 1460,sackOK,TS val 114809223
ecr 0,nop,wscale 7], length 0
C 2017-06-04 16:58:24.070532 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags
[P.], seq 1:199, ack 1, win 342, options [nop,nop,TS val 114809225 ecr
114809223], length 198
C 2017-06-04 16:58:24.070603 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags
[.], ack 199, win 350, options [nop,nop,TS val 114809225 ecr 114809225],
length 0
B 2017-06-04 16:58:24.120590 IP 99.99.99.99.8000 > 10.1.77.243.37266: Flags
[S.], seq 1520443118, ack 1165186062, win 28960, options [mss
1460,sackOK,TS val 548621250 ecr 114809223,nop,wscale 8], length 0
B 2017-06-04 16:58:24.120677 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags
[.], ack 1, win 229, options [nop,nop,TS val 114809237 ecr 548621250],
length 0
D 2017-06-04 16:58:29.189185 IP 99.99.99.99.8000 > 10.1.77.243.37266: Flags
[F.], seq 1, ack 1, win 114, options [nop,nop,TS val 548626318 ecr
114809237], length 0
E 2017-06-04 16:58:29.189337 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags
[F.], seq 1, ack 199, win 350, options [nop,nop,TS val 114810505 ecr
114809225], length 0
E 2017-06-04 16:58:29.190109 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags
[P.], seq 199:206, ack 2, win 342, options [nop,nop,TS val 114810505 ecr
114810505], length 7
E 2017-06-04 16:58:29.190182 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags
[.], ack 206, win 350, options [nop,nop,TS val 114810505 ecr 114810505],
length 0
E 2017-06-04 16:58:29.190232 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags
[F.], seq 206, ack 2, win 342, options [nop,nop,TS val 114810505 ecr
114810505], length 0
E 2017-06-04 16:58:29.190261 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags
[.], ack 207, win 350, options [nop,nop,TS val 114810505 ecr 114810505],
length 0
F 2017-06-04 16:58:29.193195 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags
[.], ack 2, win 229, options [nop,nop,TS val 114810506 ecr 548626318],
length 0
F 2017-06-04 16:59:24.066271 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags
[F.], seq 1, ack 2, win 229, options [nop,nop,TS val 114824224 ecr
548626318], length 0
F 2017-06-04 16:59:24.119154 IP 99.99.99.99.8000 > 10.1.77.243.37266: Flags
[R], seq 1520443120, win 0, length 0


--
lfs

On Thu, Apr 13, 2017 at 1:58 AM, Willy Tarreau <[email protected]> wrote:

> Hi Lincoln,
>
> On Wed, Apr 12, 2017 at 08:24:41PM -0400, Lincoln Stern wrote:
> (...)
> > *haproxy finishes connecting to the server (SYNACK/ACK) (good)*
> >     38.120527 IP 99.99.99.99.8000 > 10.10.10.10.34289: Flags [S.], seq
> > 4125907118, ack 333335568, win 28960, options [mss 1460,sackOK,TS val
> > 662461622 ecr 82055529,nop,wscale 8], length 0
> >     38.120619 IP 10.10.10.10.34289 > 99.99.99.99.8000: Flags [.], ack 1,
> > win 229, options [nop,nop,TS val 82055545 ecr 662461622], length 0
> >
> > *And now there is nothing for 5 seconds when we should have seen a data
> > packet from haproxy to the server with the "length 198" payload. *
> >
> > *It appears that haproxy never tried to send the data!?!?  *
>
> Thanks for this capture! This is a regression introduced in 1.7.3 while
> trying to fix a bug in 1.7.2, it was fixed later by this commit :
>
>   commit 51f2336ef7738c945ccedf5f1179825ac401862c
>   Author: Willy Tarreau <[email protected]>
>   Date:   Sat Mar 18 17:11:37 2017 +0100
>
>     BUG/MAJOR: stream-int: do not depend on connection flags to detect
> connectio
>     (...)
>
> Please upgrade to 1.7.5 and it will be OK.
>
> Regards,
> Willy
>

Reply via email to