Willy, I confirm that I still see the cD errors with 1.7.5. I have been running my environment which was about 80% 1.7.5 and 20% 1.7.3 for the past month and did a final upgrade over the weekend. To my surprise, virtually all the hosts that were upgraded experienced the cD error later that day.
The upgrade process was straight forward, "apt-get dist-upgrade" followed by a manual connection to test the link. I'm at a loss of how to debug this further, can you offer any advice? Here are the haproxy logs and the tcpdump details for one instance of the problem. $ haproxy -v HA-Proxy version 1.7.5-2ppa1~trusty 2017/05/27 Copyright 2000-2017 Willy Tarreau <[email protected]> apt log: Start-Date: 2017-06-04 08:42:32 Commandline: apt-get dist-upgrade Upgrade: haproxy:amd64 (1.7.3-1ppa1~trusty, 1.7.5-2ppa1~trusty) End-Date: 2017-06-04 08:42:35 PID 1750 is version 1.7.3 PID 5090 is 1.7.5 logs: Jun 3 23:04:44 app012 haproxy[1750]: 127.0.0.1:40017 [03/Jun/2017:23:03:40.146] ws-local servers/server1 1/66/64226 4498 10830 -- 0/0/0/0/0 0/0 Jun 4 00:37:49 app012 haproxy[1750]: 127.0.0.1:40106 [04/Jun/2017:00:36:56.128] ws-local servers/server1 1/59/53749 4505 11885 -- 0/0/0/0/0 0/0 Jun 4 08:42:35 app012 haproxy[5089]: Proxy ws-local started. Jun 4 08:42:35 app012 haproxy[5089]: Proxy servers started. Jun 4 08:42:35 app012 haproxy[1750]: Stopping frontend ws-local in 0 ms. Jun 4 08:42:35 app012 haproxy[1750]: Stopping backend servers in 0 ms. Jun 4 08:42:35 app012 haproxy[1750]: Proxy ws-local stopped (FE: 356 conns, BE: 0 conns). Jun 4 08:42:35 app012 haproxy[1750]: Proxy servers stopped (FE: 0 conns, BE: 356 conns). Jun 4 08:42:35 app012 haproxy[5090]: Health check for server servers/server1 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 0/1 DOWN. Jun 4 08:42:35 app012 haproxy[5090]: Server servers/server1 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Jun 4 08:42:35 app012 haproxy[5090]: servers/server1 changed its IP from 127.0.0.1 to 99.99.99.99 by mydns1/dns2. Jun 4 08:42:40 app012 haproxy[5090]: Health check for server servers/server2 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 0/1 DOWN. Jun 4 08:42:40 app012 haproxy[5090]: Server servers/server2 is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Jun 4 08:42:40 app012 haproxy[5090]: backend servers has no server available! Jun 4 08:42:40 app012 haproxy[5090]: servers/server2 changed its IP from 127.0.0.1 to 99.99.99.99 by mydns2/dns3. Jun 4 08:42:45 app012 haproxy[5090]: Health check for server servers/server1 succeeded, reason: Layer4 check passed, check duration: 68ms, status: 3/3 UP. Jun 4 08:42:45 app012 haproxy[5090]: Server servers/server1 is UP. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue. Jun 4 08:42:50 app012 haproxy[5090]: Health check for server servers/server2 succeeded, reason: Layer4 check passed, check duration: 216ms, status: 3/3 UP. Jun 4 08:42:50 app012 haproxy[5090]: Server servers/server2 is UP. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue. Jun 4 08:46:00 app012 haproxy[5090]: Health check for server servers/server1 failed, reason: Health analyze, info: "Detected 1 consecutive errors, last one was: L4 unsuccessful connection", status: 2/3 UP. Jun 4 08:46:01 app012 haproxy[5090]: 127.0.0.1:40606 [04/Jun/2017:08:46:00.271] ws-local servers/server2 502/221/935 2798 452 -- 0/0/0/0/+1 0/0 Jun 4 08:46:10 app012 haproxy[5090]: Health check for server servers/server1 succeeded, reason: Layer4 check passed, check duration: 64ms, status: 3/3 UP. Jun 4 09:35:40 app012 haproxy[5090]: 127.0.0.1:40660 [04/Jun/2017:09:34:07.364] ws-local servers/server1 1/68/93519 4962 10825 -- 0/0/0/0/0 0/0 Jun 4 11:51:27 app012 haproxy[5090]: 127.0.0.1:40839 [04/Jun/2017:11:50:55.248] ws-local servers/server1 1/52/32022 5366 2220 -- 0/0/0/0/0 0/0 Jun 4 13:52:55 app012 haproxy[5090]: 127.0.0.1:41005 [04/Jun/2017:13:52:04.501] ws-local servers/server1 1/66/51046 4520 7377 -- 0/0/0/0/0 0/0 Jun 4 15:41:43 app012 haproxy[5090]: 127.0.0.1:41142 [04/Jun/2017:15:40:51.106] ws-local servers/server1 1/68/52078 4507 7658 -- 0/0/0/0/0 0/0 Jun 4 16:58:23 app012 haproxy[5090]: 127.0.0.1:59601 [04/Jun/2017:16:58:22.757] ws-local servers/server1 1/62/782 6068 517 -- 0/0/0/0/0 0/0 Jun 4 16:58:24 app012 haproxy[5090]: 127.0.0.1:51882 [04/Jun/2017:16:58:23.703] ws-local servers/server1 1/65/416 3059 553 -- 1/1/1/1/0 0/0 Jun 4 16:58:29 app012 haproxy[5090]: 127.0.0.1:51569 [04/Jun/2017:16:58:29.192] ws-local servers/server1 1/56/397 6068 517 -- 2/2/2/2/0 0/0 Jun 4 16:58:31 app012 haproxy[5090]: 127.0.0.1:56403 [04/Jun/2017:16:58:29.528] ws-local servers/server1 1/81/2261 2924 539 -- 1/1/1/1/0 0/0 Jun 4 16:58:32 app012 haproxy[5090]: 127.0.0.1:46829 [04/Jun/2017:16:58:31.950] ws-local servers/server1 1/64/429 6364 529 -- 1/1/1/1/0 0/0 Jun 4 16:59:24 app012 haproxy[5090]: 127.0.0.1:32783 [04/Jun/2017:16:58:24.064] ws-local servers/server1 1/56/60001 0 0 cD 0/0/0/0/0 0/0 Jun 4 20:17:17 app012 haproxy[5090]: 127.0.0.1:41497 [04/Jun/2017:20:16:07.931] ws-local servers/server1 1/71/69262 4454 10697 -- 0/0/0/0/0 0/0 Jun 4 22:00:32 app012 haproxy[5090]: 127.0.0.1:41608 [04/Jun/2017:21:59:27.620] ws-local servers/server1 1/67/64958 4484 9555 -- 0/0/0/0/0 0/0 Jun 4 22:19:00 app012 haproxy[5090]: 127.0.0.1:41629 [04/Jun/2017:22:18:05.365] ws-local servers/server1 1/61/55581 4500 5720 -- 0/0/0/0/0 0/0 Jun 5 09:20:58 app012 haproxy[5090]: 127.0.0.1:42308 [05/Jun/2017:09:20:08.785] ws-local servers/server1 1/53/49414 4515 8458 -- 0/0/0/0/0 0/0 Jun 5 10:31:57 app012 haproxy[5090]: 127.0.0.1:42406 [05/Jun/2017:10:31:02.509] ws-local servers/server1 1/71/55080 4515 9186 -- 0/0/0/0/0 0/0 Jun 5 10:46:28 app012 haproxy[5090]: 127.0.0.1:42432 [05/Jun/2017:10:45:40.011] ws-local servers/server1 1/65/48434 4485 6146 -- 0/0/0/0/0 0/0 Jun 5 11:51:20 app012 haproxy[5090]: 127.0.0.1:42525 [05/Jun/2017:11:50:25.156] ws-local servers/server1 1/73/55703 4481 13552 -- 0/0/0/0/0 0/0 Jun 5 11:54:02 app012 haproxy[5090]: 127.0.0.1:42529 [05/Jun/2017:11:53:06.736] ws-local servers/server1 1/62/55864 7007 10573 -- 0/0/0/0/0 0/0 Jun 5 12:02:53 app012 haproxy[5090]: 127.0.0.1:42539 [05/Jun/2017:12:02:05.601] ws-local servers/server1 1/65/48225 4498 6249 -- 0/0/0/0/0 0/0 Jun 5 12:24:31 app012 haproxy[5090]: 127.0.0.1:42572 [05/Jun/2017:12:23:35.267] ws-local servers/server1 1/54/56355 4491 7932 -- 0/0/0/0/0 0/0 Jun 5 12:56:38 app012 haproxy[5090]: 127.0.0.1:42612 [05/Jun/2017:12:55:42.720] ws-local servers/server1 1/66/55318 4496 7587 -- 0/0/0/0/0 0/0 Jun 5 13:02:28 app012 haproxy[5090]: 127.0.0.1:42629 [05/Jun/2017:13:01:33.449] ws-local servers/server1 1/53/55206 4509 8206 -- 0/0/0/0/0 0/0 Jun 5 13:31:02 app012 haproxy[5090]: 127.0.0.1:42668 [05/Jun/2017:13:30:16.061] ws-local servers/server1 1/66/46680 4487 7111 -- 0/0/0/0/0 0/0 A -> Local connection established B -> HAPROXY connection established C -> Local data sent to HAPROXY (but does not make it out) D -> Remote server times out after 5 seconds E -> Local connection closed by HAPROXY F -> 60 seconds later HAPROXY closed connection with server A 2017-06-04 16:58:24.064253 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags [S], seq 4019382753, win 43690, options [mss 65495,sackOK,TS val 114809223 ecr 0,nop,wscale 7], length 0 A 2017-06-04 16:58:24.064312 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags [S.], seq 452812023, ack 4019382754, win 43690, options [mss 65495,sackOK,TS val 114809223 ecr 114809223,nop,wscale 7], length 0 A 2017-06-04 16:58:24.064358 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags [.], ack 1, win 342, options [nop,nop,TS val 114809223 ecr 114809223], length 0 B 2017-06-04 16:58:24.064577 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags [S], seq 1165186061, win 29200, options [mss 1460,sackOK,TS val 114809223 ecr 0,nop,wscale 7], length 0 C 2017-06-04 16:58:24.070532 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags [P.], seq 1:199, ack 1, win 342, options [nop,nop,TS val 114809225 ecr 114809223], length 198 C 2017-06-04 16:58:24.070603 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags [.], ack 199, win 350, options [nop,nop,TS val 114809225 ecr 114809225], length 0 B 2017-06-04 16:58:24.120590 IP 99.99.99.99.8000 > 10.1.77.243.37266: Flags [S.], seq 1520443118, ack 1165186062, win 28960, options [mss 1460,sackOK,TS val 548621250 ecr 114809223,nop,wscale 8], length 0 B 2017-06-04 16:58:24.120677 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags [.], ack 1, win 229, options [nop,nop,TS val 114809237 ecr 548621250], length 0 D 2017-06-04 16:58:29.189185 IP 99.99.99.99.8000 > 10.1.77.243.37266: Flags [F.], seq 1, ack 1, win 114, options [nop,nop,TS val 548626318 ecr 114809237], length 0 E 2017-06-04 16:58:29.189337 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags [F.], seq 1, ack 199, win 350, options [nop,nop,TS val 114810505 ecr 114809225], length 0 E 2017-06-04 16:58:29.190109 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags [P.], seq 199:206, ack 2, win 342, options [nop,nop,TS val 114810505 ecr 114810505], length 7 E 2017-06-04 16:58:29.190182 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags [.], ack 206, win 350, options [nop,nop,TS val 114810505 ecr 114810505], length 0 E 2017-06-04 16:58:29.190232 IP 127.0.0.1.32783 > 127.0.0.1.9011: Flags [F.], seq 206, ack 2, win 342, options [nop,nop,TS val 114810505 ecr 114810505], length 0 E 2017-06-04 16:58:29.190261 IP 127.0.0.1.9011 > 127.0.0.1.32783: Flags [.], ack 207, win 350, options [nop,nop,TS val 114810505 ecr 114810505], length 0 F 2017-06-04 16:58:29.193195 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags [.], ack 2, win 229, options [nop,nop,TS val 114810506 ecr 548626318], length 0 F 2017-06-04 16:59:24.066271 IP 10.1.77.243.37266 > 99.99.99.99.8000: Flags [F.], seq 1, ack 2, win 229, options [nop,nop,TS val 114824224 ecr 548626318], length 0 F 2017-06-04 16:59:24.119154 IP 99.99.99.99.8000 > 10.1.77.243.37266: Flags [R], seq 1520443120, win 0, length 0 -- lfs On Thu, Apr 13, 2017 at 1:58 AM, Willy Tarreau <[email protected]> wrote: > Hi Lincoln, > > On Wed, Apr 12, 2017 at 08:24:41PM -0400, Lincoln Stern wrote: > (...) > > *haproxy finishes connecting to the server (SYNACK/ACK) (good)* > > 38.120527 IP 99.99.99.99.8000 > 10.10.10.10.34289: Flags [S.], seq > > 4125907118, ack 333335568, win 28960, options [mss 1460,sackOK,TS val > > 662461622 ecr 82055529,nop,wscale 8], length 0 > > 38.120619 IP 10.10.10.10.34289 > 99.99.99.99.8000: Flags [.], ack 1, > > win 229, options [nop,nop,TS val 82055545 ecr 662461622], length 0 > > > > *And now there is nothing for 5 seconds when we should have seen a data > > packet from haproxy to the server with the "length 198" payload. * > > > > *It appears that haproxy never tried to send the data!?!? * > > Thanks for this capture! This is a regression introduced in 1.7.3 while > trying to fix a bug in 1.7.2, it was fixed later by this commit : > > commit 51f2336ef7738c945ccedf5f1179825ac401862c > Author: Willy Tarreau <[email protected]> > Date: Sat Mar 18 17:11:37 2017 +0100 > > BUG/MAJOR: stream-int: do not depend on connection flags to detect > connectio > (...) > > Please upgrade to 1.7.5 and it will be OK. > > Regards, > Willy >

