Hi, Willy!

This kind of errors only happen on proxy-sections with ssl-enabled backends
('ssl verify none' in server lines).
In order to find out what realy happens from network point of view I added
one plain-http backend to one of the proxy-sections.
Then I captured the sutuation when request failed on this plain-http
backend.

interesting parameters from config:
  timeout connect 50
  timeout queue   1s
  retries 1
servers lines look like this:
  default-server weight 50 on-error fastinter
  server BACKEND_A:9790 10.10.10.10:9790 weight 100 check ssl verify none
observe layer7
  server BACKEND_B:9791 10.10.10.11:9791 weight 100 check observe layer7

Now I'll show you 3 tcp-sessions, I've captured:
The first session (from client to haproxy instance):
1 09:10:48.222378 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [S], seq
3359830899, win 43690, options [mss 65476,sackOK,TS val 3131804957 ecr
0,nop,wscale 11], length 0
2 09:10:48.222388 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [S.], seq
1294278968, ack 3359830900, win 43690, options [mss 65476,sackOK,TS val
3131804957 ecr 3131804957,nop,wscale 11], length 0
3 09:10:48.222397 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [.], ack 1,
win 22, options [nop,nop,TS val 3131804957 ecr 3131804957], length 0
4 09:10:48.222449 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [P.], seq
1:3505, ack 1, win 22, options [nop,nop,TS val 3131804957 ecr 3131804957],
length 3504
5 09:10:48.222458 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [.], ack
3505, win 86, options [nop,nop,TS val 3131804957 ecr 3131804957], length 0
6 09:10:48.272790 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [F.], seq 1,
ack 3505, win 86, options [nop,nop,TS val 3131805008 ecr 3131804957],
length 0
7 09:10:48.272836 IP6 127.0.0.1.52726 > 127.0.0.1.link: Flags [F.], seq
3505, ack 2, win 22, options [nop,nop,TS val 3131805008 ecr 3131805008],
length 0
8 09:10:48.272844 IP6 127.0.0.1.link > 127.0.0.1.52726: Flags [.], ack
3506, win 86, options [nop,nop,TS val 3131805008 ecr 3131805008], length 0

As you can see the client sent request to the haproxy instance (packet #4).
The instance acknoledged it (packet #5).
And then 50.332ms after haproxy answered with FIN with no data (packet #6,
"length 0").

2nd session (from haproxy to ssl-enabled backend A, dumped with tshark for
better readability):
1 09:10:48.222518 HAPROXY → BACKEND_A TCP 94 36568 → 9790 [SYN] Seq=0
Win=26520 Len=0 MSS=8840 SACK_PERM=1 TSval=3064071282 TSecr=0 WS=2048
2 09:10:48.222624 BACKEND_A → HAPROXY TCP 94 9790 → 36568 [SYN, ACK] Seq=0
Ack=1 Win=26784 Len=0 MSS=8940 SACK_PERM=1 TSval=3366865490
TSecr=3064071282 WS=256
3 09:10:48.222639 HAPROXY → BACKEND_A TCP 86 36568 → 9790 [ACK] Seq=1 Ack=1
Win=26624 Len=0 TSval=3064071283 TSecr=3366865490
4 09:10:48.222658 HAPROXY → BACKEND_A TLSv1 603 Client Hello
5 09:10:48.222741 BACKEND_A → HAPROXY TCP 86 9790 → 36568 [ACK] Seq=1
Ack=518 Win=27904 Len=0 TSval=3366865490 TSecr=3064071283
6 09:10:48.272165 HAPROXY → BACKEND_A TCP 86 36568 → 9790 [RST, ACK]
Seq=518 Ack=1 Win=26624 Len=0 TSval=3064071332 TSecr=3366865490

Backend didn't answer with Server Hello in 49.5ms after tcp-handshed has
finished for some reason. That is the root case of the error!!!

The last session (from haproxy to plain-http backend B):
1 09:10:48.272235 HAPROXY → BACKEND_B TCP 94 33532 → 9791 [SYN] Seq=0
Win=26520 Len=0 MSS=8840 SACK_PERM=1 TSval=561683483 TSecr=0 WS=2048
2 09:10:48.272358 BACKEND_B → HAPROXY TCP 94 9791 → 33532 [SYN, ACK] Seq=0
Ack=1 Win=26784 Len=0 MSS=8940 SACK_PERM=1 TSval=874005989 TSecr=561683483
WS=256
3 09:10:48.272369 HAPROXY → BACKEND_B TCP 86 33532 → 9791 [ACK] Seq=1 Ack=1
Win=26624 Len=0 TSval=561683483 TSecr=874005989
4 09:10:48.272396 HAPROXY → BACKEND_B HTTP 3590 GET /xxxxxx/xxxxxx/xxx
HTTP/1.1
5 09:10:48.272448 HAPROXY → BACKEND_B TCP 86 33532 → 9791 [FIN, ACK]
Seq=3505 Ack=1 Win=26624 Len=0 TSval=561683483 TSecr=874005989
6 09:10:48.272529 BACKEND_B → HAPROXY TCP 86 9791 → 33532 [ACK] Seq=1
Ack=3505 Win=33792 Len=0 TSval=874005989 TSecr=561683483
7 09:10:48.272729 BACKEND_B → HAPROXY TCP 86 9791 → 33532 [FIN, ACK] Seq=1
Ack=3506 Win=33792 Len=0 TSval=874005989 TSecr=561683483
8 09:10:48.272736 HAPROXY → BACKEND_B TCP 86 33532 → 9791 [ACK] Seq=3506
Ack=2 Win=26624 Len=0 TSval=561683484 TSecr=874005989

As you can see, haproxy instance made another try to establish connection
and it did succeed but 50ms are over, and FIN was send right after
GET-request.

Conclusion:
* Haproxy does not respond with 502 in case of timing out on ssl-connection
establishing to backends
* Seems strange to me that connection timer was not reset after the first
unsuccessfull connection ("retries 1" was set)
* SD-status of error is confusing :)

--
Best regards,
Maksim Kupriianov

чт, 23 мая 2019 г. в 06:40, Willy Tarreau <w...@1wt.eu>:

> Hi Maksim,
>
> On Tue, May 21, 2019 at 01:47:30PM +0300, ?????? ????????? wrote:
> > Hi!
> >
> > I've run into some weird problem of many connections failed with SD
> status
> > in log. And I have no idea how to discover the source of the problem.
> >
> > >From the point of client it looks like this:
> > * Client (located on the same machine as haproxy) successfully opens a
> > connection to haproxy over loopback interfaces and sends http/1.1
> > get-request.
> > * haproxy accepts request (ack was send)
> > * after some time passed (usually 1ms longer then "timeout connect" set)
> > haproxy just sends tcp-FIN with no user data in the packet.
> >
> > >From the point of haproxy I see this:
> > 2019-05-21T13:30:02+03:00 haproxy[608365] local2.info 127.0.0.1:54064
> > [21/May/2019:13:30:01.289] some-host some-host/backend:80
> 0/1000/0/-1/1001
> > -1 0 - - SD-- 164/19/18/0/+1 0/0 "GET /some-request HTTP/1.1"
> >
> > And it doesn't matter how many retries I set for this backend section in
> > config.
> > Can you give me a hint, how to discover a real reason  of this errors?
>
> It looks like a bug. You should have a 502 error sent to the client,
> not just a silent close (unless you're purposely sending an empty 502
> error file of course). Well in fact if haproxy really blocked on data
> but received correct headers, it's normal not to forge a 502 response,
> but there should be the server's headers at least.
>
> Now what I'm seeing at this point from the log is that haproxy successfully
> connected to the server but didn't receive valid headers in response. The
> "SD" status is thus not correct. You should have "SH" there in this case.
> You can try to issue "show errors" on the CLI to see if something invalid
> was detected in the response. If you don't have anything, then you'll have
> to start tcpdump between haproxy and the server. Given that the termination
> state doesn't match the timers, there is a bug somewhere, so everything
> is imaginable from this point. I'm interested in knowing what case triggers
> this, of course.
>
> Regards,
> Willy
>

Reply via email to