Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

Kazuhiro Takenaka Tue, 25 May 2021 01:20:10 -0700

Hellow Willy,

# Sorry I post this message directly to you,
# so i resend to the mailing list.


I tested the patches you suggested, and got
the following results.

TestA haproxy-1.8.30 + patch(1 + 2 + 3)
  Shorter HTML: 0
  HTTP 408    : 2

TestB haproxy-1.8.30 + patch(2 only)
  Shorter HTML: 0
  HTTP 408    : 3

TestC haproxy-2.4.0
  Shorter HTML: 0
  HTTP 408    : 0

haproxy-2.4.0 ran without no issue,
but I found the bytes_read column of
haproxy-2.4.0 was diffrent from that of
haproxy-1.8.30 and haproxy-2.0.22 in the
normal cases.

Values of bytes_read:
  haproxy-1.8.30: 4864
  haproxy-2.0.22: 4837 (Also 2.3.X)
  haproxy-2.4.0 : 4832

I don't think the difference of bytes_read is
related to the shorter html issue and the
HTTP 408 issue, I just noticed so report it.

For now, I've ran TestA,B,C once.
I am planning to do 2 more tests on each case.

>> I also tried haproxy-2.0.22 and I didn't find
>> messages about shorter htmls but the number of
>> log messages is smaller than the number of HTTP
>> accesses.
>
>This should definitely not happen, but it obviously depends
>where they are counted.
>
>For example some aborted idempotent requests between haproxy
>and the server on a reused connection could lead to a silent
>close to let the client try again.
>
>In this case it is guaranteed by the compliance with the
>HTTP protocol that the server has not done anything that
>cannot be retried, and all haproxy can see there is a
>failure to process (while in fact it could just be caused
>by an abort in flight).
>
>So if you're missing very few logs it could be an explanation.
>Or you may simply be losing logs since they're sent as datagrams
>only in 2.0.

Yes, difference of the number of logs was 3.
Your explanation hit the point, thank you.

Kazu

On Tue, May 18, 2021 at 6:52 PM Willy Tarreau <[email protected]> wrote:

> Hello,
>
> On Mon, May 17, 2021 at 09:47:10AM +0900, Kazuhiro Takenaka wrote:
> > Hello
> >
> > This is my first post to this mailing list.
> > I am not good at using English, so feel free to ask me
> > if my text is hard to understand.
>
> Rest assured that the majority of participants here (me included) do
> not have English as their native language, so you're not special on
> this point. And I had absolutely no problem understanding all your
> explanations, your English is better than you seem to think :-)
>
> > I noticed haproxy-1.8 sometimes sent incomplete htmls to
> > clients when running haproxy with the attached haproxy.cfg
> > that uses multithread function.
> >
> > # I also attached content01.html and check.html that
> > # are deployed on backend servers.
> > # content01.html is used in the confirmnation test
> > # described below, check.html is for health check
> > # purpose.
> >
> > In this case, the client receives a shorter html.
> >
> > I confirmed haoproxy's log message corresponded to the
> > shorter html, the following line is one of them.
> >
> > 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01
> rs_http01web/web01
> > 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> > /content01.html?x=170026897 HTTP/1.1"
>
> So it indicates an immediate close or error that is being detected on
> the server side just after two TCP segments. This can have some
> importance because depending how the server's TCP stack is tuned, it
> is very likely that it will only send two segments before growing its
> window, leaving a short blank period after them which can allow haproxy
> to detect an error.
>
> (...)
> > The following list is the frequency of abnormal access
> > when a total of 10 million accesses are made in 20 parallel
> > curl processes.
> >
> > status_code bytes_read occurence
> > 200         4344       1
> => exactly 3 segments
> > 200         2896       9
> => exactly 2 segments
> > 200         1448       6
> => exactly 1 segment
>
> It makes me think about a server which closes with a TCP reset after
> the data. It then becomes extremely timing-dependent and could explain
> why the size is a multiple of the number of segments.
>
> > 408         216        2
> >
> > The following line is the log messages
> > in the case of 408.
> >
> > 2021-03-23T16:02:42.444084+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:37052 [23/Mar/2021:16:02:32.472] vs_http01
> vs_http01/<NOSRV>
> > -1/-1/-1/-1/10000 408 212 - - cR-- 14/14/0/0/0 0/0 "<BADREQ>"
>
> So the client's request didn't make it into haproxy.
>
> > When I first met this behavior, I used haproxy-1.8.25 shipped
> > with RHEL8.3. So I obtained haproxy-1.8.30 from http://git.haproxy.org/
> > and built it, ran the test and got the result of the same sort.
> >
> > This behavior didn't happen without using multithread function.
>
> So that definitely indicates a race condition somewhere.
>
> > Next, I tried on haproxy-2.0.0 and confirmed it ran normally
> > without this behavior.
> >
> > Then I picked up several versions of haproxy
> > between 1.8.0 and 2.0.0 and built them, test them
> > and found the commit below resolved this behavior.
> >
> > ===================================================
> > commit 524344b4e0434b86d83869ef051f98d24505c08f
> > Author: Olivier Houchard <[email protected]>
> > Date:   Wed Sep 12 17:12:53 2018 +0200
> >
> >     MEDIUM: connections: Don't reset the polling flags in
> conn_fd_handler().
> >
> >     Resetting the polling flags at the end of conn_fd_handler()
> shouldn't be
> >     needed anymore, and it will create problem when we won't handle
> > send/recv
> >     from conn_fd_handler() anymore.
> >
> > diff --git a/src/connection.c b/src/connection.c
> > index ab32567b..e303f2c3 100644
> > --- a/src/connection.c
> > +++ b/src/connection.c
> > @@ -203,9 +203,6 @@ void conn_fd_handler(int fd)
> >             conn->mux->wake(conn) < 0)
> >                 return;
> >
> > -       /* remove the events before leaving */
> > -       fdtab[fd].ev &= FD_POLL_STICKY;
> > -
> >         /* commit polling changes */
> >         conn->flags &= ~CO_FL_WILL_UPDATE;
> >         conn_cond_update_polling(conn);
> > ===================================================
> >
> > So I commented out "fdtab[fd].ev &= FD_POLL_STICKY;"
> > from both of haproxy-1.8.25 and haproxy-1.8.30,
> > then the behavior is resolved.
>
> This is very strange. I could almost have understood the opposite, i.e.
> remove an activity report before leaving so that we don't wake up, but
> here *not* removing the flags indicates that we're leaving the FD
> reports for another call.
>
> > I don't know why this commit resolves the behavior,
> > I just tried and ran the test.
>
> What I *suspect* is happening is the following:
>
>   - the server closes with a TCP reset
>   - sometimes for an unknown reason we do not process the event
>     before leaving this function
>   - we then flush the activity flags
>   - the reset flag is still there, reported as POLLERR, forcing
>     an immediate subsequent wakeup
>   - the I/O handler sees POLLERR without POLLIN and concludes this is
>     a hard error and doesn't even bother to try to read, resulting in
>     the loss of the pending segments.
>
> Then commenting out the code above would make the I/O handler still
> see POLLIN with POLLERR and thus continue to read regardless.
>
> > My question is:
> >   Does this commit make side effects such like
> >   data structure corruption or memory leak on haprox-1.8 ?
>
> It will not cause such issues but I'm a bit scared to take it into
> 1.8, as I don't know if it can cause spurious wakeups leading to 100%
> CPU on certain conditions like a buffer full for example.
>
> There are three patches that you could try to see if they address the
> issue. The first one would consist in not looking at close/error early
> in raw_sock_to_buf():
>
> diff --git a/src/raw_sock.c b/src/raw_sock.c
> index ad0210105..52db9cd5c 100644
> --- a/src/raw_sock.c
> +++ b/src/raw_sock.c
> @@ -264,7 +264,7 @@ static int raw_sock_to_buf(struct connection *conn,
> struct buffer *buf, int coun
>         conn_refresh_polling_flags(conn);
>         errno = 0;
>
> -       if (unlikely(!(fdtab[conn->handle.fd].ev & FD_POLL_IN))) {
> +       if (0 && unlikely(!(fdtab[conn->handle.fd].ev & FD_POLL_IN))) {
>                 /* stop here if we reached the end of data */
>                 if ((fdtab[conn->handle.fd].ev &
> (FD_POLL_ERR|FD_POLL_HUP)) == FD_POLL_HUP)
>                         goto read0;
>
> A second one would make sure we always try to read everything:
>
> diff --git a/src/raw_sock.c b/src/raw_sock.c
> index ad0210105..fbf20ae35 100644
> --- a/src/raw_sock.c
> +++ b/src/raw_sock.c
> @@ -302,7 +302,7 @@ static int raw_sock_to_buf(struct connection *conn,
> struct buffer *buf, int coun
>                 if (ret > 0) {
>                         buf->i += ret;
>                         done += ret;
> -                       if (ret < try) {
> +                       if (0 && ret < try) {
>                                 /* unfortunately, on level-triggered
> events, POLL_HUP
>                                  * is generally delivered AFTER the system
> buffer is
>                                  * empty, unless the poller supports
> POLL_RDHUP. If
>
> A third one would consist in not clearing the IN flag if ERR is reported:
>
> diff --git a/src/connection.c b/src/connection.c
> index fa04c024e..10bf65db2 100644
> --- a/src/connection.c
> +++ b/src/connection.c
> @@ -193,7 +193,8 @@ void conn_fd_handler(int fd)
>                 return;
>
>         /* remove the events before leaving */
> -       fdtab[fd].ev &= FD_POLL_STICKY;
> +       if (!(fdtab[fd].ev & FD_POLL_ERR))
> +               fdtab[fd].ev &= FD_POLL_STICKY;
>
>         /* commit polling changes */
>         conn->flags &= ~CO_FL_WILL_UPDATE;
>
> This one will make sure that we don't lose the info about the presence
> of input data when an error is reported. I'm just half-happy with this
> to be honest as it would not be very much different from the one you
> pointed before.
>
> I wouldn't be surprised if the second one could be sufficient and would
> even be fine with applying that to 1.8 as it's only a small pessimisation
> of a possible optimization. That's the only place I'm seeing where we
> could possibly not process all available data in the first round before
> having a look at that ERR flag. It would just look odd that an incomplete
> response is returned in response to recv() here.
>
> > I also tried haproxy-2.0.22 and I didn't find
> > messages about shorter htmls but the number of
> > log messages is smaller than the number of HTTP
> > accesses.
>
> This should definitely not happen, but it obviously depends where they are
> counted. For example some aborted idempotent requests between haproxy and
> the server on a reused connection could lead to a silent close to let the
> client try again. In this case it is guaranteed by the compliance with the
> HTTP protocol that the server has not done anything that cannot be retried,
> and all haproxy can see there is a failure to process (while in fact it
> could just be caused by an abort in flight). So if you're missing very few
> logs it could be an explanation. Or you may simply be losing logs since
> they're sent as datagrams only in 2.0.
>
> > I am considering this would be another issue.
> >
> > Both of haproxy-2.3.0 and haproxy-2.3.10 don't make
> > shorter html issue and insufficient log messages issue.
>
> Great! There were some changes between 2.0 and 2.3 which could help with
> the output buffering and possibly reduce the number of losses on the log
> path (all this to be taken with a grain of salt). If you're interested,
> 2.4.0 was released last week and is a long term supported version.
>
> Thanks a lot for your in-depth analysis!
> Willy
>

Re: HAProxy-1.8 sometimes sends a shorter html when using multithread function

Reply via email to