On Tue, May 18, 2021 at 5:21 AM Willy Tarreau <w...@1wt.eu> wrote:

> Hello,
>
> On Mon, May 17, 2021 at 09:47:10AM +0900, Kazuhiro Takenaka wrote:
> > Hello
> >
> > This is my first post to this mailing list.
> > I am not good at using English, so feel free to ask me
> > if my text is hard to understand.
>
> Rest assured that the majority of participants here (me included) do
> not have English as their native language, so you're not special on
> this point. And I had absolutely no problem understanding all your
> explanations, your English is better than you seem to think :-)
>
> > I noticed haproxy-1.8 sometimes sent incomplete htmls to
> > clients when running haproxy with the attached haproxy.cfg
> > that uses multithread function.
> >
> > # I also attached content01.html and check.html that
> > # are deployed on backend servers.
> > # content01.html is used in the confirmnation test
> > # described below, check.html is for health check
> > # purpose.
> >
> > In this case, the client receives a shorter html.
> >
> > I confirmed haoproxy's log message corresponded to the
> > shorter html, the following line is one of them.
> >
> > 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01
> rs_http01web/web01
> > 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> > /content01.html?x=170026897 HTTP/1.1"
>

This is exactly the same problem as I reported on the mailing list a couple
weeks ago. I accidentally replied to off-list to Kazuhiro but will continue
the conversation here.

So it indicates an immediate close or error that is being detected on
> the server side just after two TCP segments. This can have some
> importance because depending how the server's TCP stack is tuned, it
> is very likely that it will only send two segments before growing its
> window, leaving a short blank period after them which can allow haproxy
> to detect an error.
>

This is interesting.


> (...)
> > The following list is the frequency of abnormal access
> > when a total of 10 million accesses are made in 20 parallel
> > curl processes.
> >
> > status_code bytes_read occurence
> > 200         4344       1
> => exactly 3 segments
> > 200         2896       9
> => exactly 2 segments
> > 200         1448       6
> => exactly 1 segment
>
> It makes me think about a server which closes with a TCP reset after
> the data. It then becomes extremely timing-dependent and could explain
> why the size is a multiple of the number of segments.
>
> > 408         216        2
> >
> > The following line is the log messages
> > in the case of 408.
> >
> > 2021-03-23T16:02:42.444084+09:00 lb01 [err] haproxy[703957]:
> > 192.168.1.73:37052 [23/Mar/2021:16:02:32.472] vs_http01
> vs_http01/<NOSRV>
> > -1/-1/-1/-1/10000 408 212 - - cR-- 14/14/0/0/0 0/0 "<BADREQ>"
>
> So the client's request didn't make it into haproxy.
>
> > When I first met this behavior, I used haproxy-1.8.25 shipped
> > with RHEL8.3. So I obtained haproxy-1.8.30 from http://git.haproxy.org/
> > and built it, ran the test and got the result of the same sort.
> >
> > This behavior didn't happen without using multithread function.
>
> So that definitely indicates a race condition somewhere.
>

In my case it was also reported that nbthread must be set in order to
trigger the HTTP response status 200 with SD-- termination state, along
with shortened response size. Everything about this bug seems exactly
inline with what I am seeing.


> > Next, I tried on haproxy-2.0.0 and confirmed it ran normally
> > without this behavior.
> >
> > Then I picked up several versions of haproxy
> > between 1.8.0 and 2.0.0 and built them, test them
> > and found the commit below resolved this behavior.
> >
> > ===================================================
> > commit 524344b4e0434b86d83869ef051f98d24505c08f
> > Author: Olivier Houchard <ohouch...@haproxy.com>
> > Date:   Wed Sep 12 17:12:53 2018 +0200
> >
> >     MEDIUM: connections: Don't reset the polling flags in
> conn_fd_handler().
> >
> >     Resetting the polling flags at the end of conn_fd_handler()
> shouldn't be
> >     needed anymore, and it will create problem when we won't handle
> > send/recv
> >     from conn_fd_handler() anymore.
> >
> > diff --git a/src/connection.c b/src/connection.c
> > index ab32567b..e303f2c3 100644
> > --- a/src/connection.c
> > +++ b/src/connection.c
> > @@ -203,9 +203,6 @@ void conn_fd_handler(int fd)
> >             conn->mux->wake(conn) < 0)
> >                 return;
> >
> > -       /* remove the events before leaving */
> > -       fdtab[fd].ev &= FD_POLL_STICKY;
> > -
> >         /* commit polling changes */
> >         conn->flags &= ~CO_FL_WILL_UPDATE;
> >         conn_cond_update_polling(conn);
> > ===================================================
> >
> > So I commented out "fdtab[fd].ev &= FD_POLL_STICKY;"
> > from both of haproxy-1.8.25 and haproxy-1.8.30,
> > then the behavior is resolved.
>
> This is very strange. I could almost have understood the opposite, i.e.
> remove an activity report before leaving so that we don't wake up, but
> here *not* removing the flags indicates that we're leaving the FD
> reports for another call.
>
> > I don't know why this commit resolves the behavior,
> > I just tried and ran the test.
>
> What I *suspect* is happening is the following:
>
>   - the server closes with a TCP reset
>   - sometimes for an unknown reason we do not process the event
>     before leaving this function
>   - we then flush the activity flags
>   - the reset flag is still there, reported as POLLERR, forcing
>     an immediate subsequent wakeup
>   - the I/O handler sees POLLERR without POLLIN and concludes this is
>     a hard error and doesn't even bother to try to read, resulting in
>     the loss of the pending segments.
>

In my original message, I included a portion of the tcpdump and the RST
package is being sent by haproxy to the server. I never see a TCP reset
from the server itself.

Under wireshark, I can see that the HTTP response is a total of 3 segments,
and as far as I can tell all three segments were received by haproxy.
Pardon if this next bit doesn't make sense, but the third and final segment
is shown under wireshark as the HTTP response itself. In other words:

Segment #1: 2896 bytes
Segment #2: 1448 bytes
Segment #3: 1059 bytes <- this final frame also includes the HTTP response
header

When the problem occurs, haproxy logs a termination state of "SD--" and the
byte count is short by exactly the size of the last segment (1059 in this
case). I think this is well established here but I just wanted to be clear.



> Then commenting out the code above would make the I/O handler still
> see POLLIN with POLLERR and thus continue to read regardless.
>
> > My question is:
> >   Does this commit make side effects such like
> >   data structure corruption or memory leak on haprox-1.8 ?
>
> It will not cause such issues but I'm a bit scared to take it into
> 1.8, as I don't know if it can cause spurious wakeups leading to 100%
> CPU on certain conditions like a buffer full for example.
>
> There are three patches that you could try to see if they address the
> issue. The first one would consist in not looking at close/error early
> in raw_sock_to_buf():
>
> diff --git a/src/raw_sock.c b/src/raw_sock.c
> index ad0210105..52db9cd5c 100644
> --- a/src/raw_sock.c
> +++ b/src/raw_sock.c
> @@ -264,7 +264,7 @@ static int raw_sock_to_buf(struct connection *conn,
> struct buffer *buf, int coun
>         conn_refresh_polling_flags(conn);
>         errno = 0;
>
> -       if (unlikely(!(fdtab[conn->handle.fd].ev & FD_POLL_IN))) {
> +       if (0 && unlikely(!(fdtab[conn->handle.fd].ev & FD_POLL_IN))) {
>                 /* stop here if we reached the end of data */
>                 if ((fdtab[conn->handle.fd].ev &
> (FD_POLL_ERR|FD_POLL_HUP)) == FD_POLL_HUP)
>                         goto read0;
>
> A second one would make sure we always try to read everything:
>
> diff --git a/src/raw_sock.c b/src/raw_sock.c
> index ad0210105..fbf20ae35 100644
> --- a/src/raw_sock.c
> +++ b/src/raw_sock.c
> @@ -302,7 +302,7 @@ static int raw_sock_to_buf(struct connection *conn,
> struct buffer *buf, int coun
>                 if (ret > 0) {
>                         buf->i += ret;
>                         done += ret;
> -                       if (ret < try) {
> +                       if (0 && ret < try) {
>                                 /* unfortunately, on level-triggered
> events, POLL_HUP
>                                  * is generally delivered AFTER the system
> buffer is
>                                  * empty, unless the poller supports
> POLL_RDHUP. If
>
> A third one would consist in not clearing the IN flag if ERR is reported:
>
> diff --git a/src/connection.c b/src/connection.c
> index fa04c024e..10bf65db2 100644
> --- a/src/connection.c
> +++ b/src/connection.c
> @@ -193,7 +193,8 @@ void conn_fd_handler(int fd)
>                 return;
>
>         /* remove the events before leaving */
> -       fdtab[fd].ev &= FD_POLL_STICKY;
> +       if (!(fdtab[fd].ev & FD_POLL_ERR))
> +               fdtab[fd].ev &= FD_POLL_STICKY;
>
>         /* commit polling changes */
>         conn->flags &= ~CO_FL_WILL_UPDATE;
>
> This one will make sure that we don't lose the info about the presence
> of input data when an error is reported. I'm just half-happy with this
> to be honest as it would not be very much different from the one you
> pointed before.
>
> I wouldn't be surprised if the second one could be sufficient and would
> even be fine with applying that to 1.8 as it's only a small pessimisation
> of a possible optimization. That's the only place I'm seeing where we
> could possibly not process all available data in the first round before
> having a look at that ERR flag. It would just look odd that an incomplete
> response is returned in response to recv() here.
>

I can help test these in my environment. Thanks.

Ryan



> > I also tried haproxy-2.0.22 and I didn't find
> > messages about shorter htmls but the number of
> > log messages is smaller than the number of HTTP
> > accesses.
>
> This should definitely not happen, but it obviously depends where they are
> counted. For example some aborted idempotent requests between haproxy and
> the server on a reused connection could lead to a silent close to let the
> client try again. In this case it is guaranteed by the compliance with the
> HTTP protocol that the server has not done anything that cannot be retried,
> and all haproxy can see there is a failure to process (while in fact it
> could just be caused by an abort in flight). So if you're missing very few
> logs it could be an explanation. Or you may simply be losing logs since
> they're sent as datagrams only in 2.0.
>
> > I am considering this would be another issue.
> >
> > Both of haproxy-2.3.0 and haproxy-2.3.10 don't make
> > shorter html issue and insufficient log messages issue.
>
> Great! There were some changes between 2.0 and 2.3 which could help with
> the output buffering and possibly reduce the number of losses on the log
> path (all this to be taken with a grain of salt). If you're interested,
> 2.4.0 was released last week and is a long term supported version.
>
> Thanks a lot for your in-depth analysis!
> Willy
>
>

Reply via email to