Hi guys,

> > I faced a problem dealing with l4 (tcp mode) haproxy-based proxy over
> > Graphite's component receiving metrics from clients and clients who are
> > connecting just to send one or two Graphite-metrics and disconnecting right
> > after.
> >
> > It looks like this
> > 1. Client connects to haproxy (SYN/SYN-ACK/ACK)
> > 2. Client sends one line of metric
> > 3. Haproxy acknowledges receiving this line (ACK to client)
> > 4. Client disconnects (FIN, FIN-ACK)
> > 5. Haproxy writes 1/-1/0/0 CC-termination state to log without even trying 
> > to connect to a backend and send client's data to it.
> > 6. Metric is lost :(
> >
> > If the client is slow enough between steps 1 and 2 or it sends a bunch of 
> > metrics so haproxy has time to connect to a backend - everything works like 
> > a charm.
> 
> The issue though is the client disconnect. If we delay the client
> disconnect, it could work. Try playing with tc by delaying the
> incoming FIN packets for a few hundred milliseconds (make sure you
> only apply this to this particular traffic, for example this
> particular destination port).
> 

In fact it's not that black-or-white. A client disconnecting first
in TCP is *always* a protocol design issue, because it leaves the
source port in TIME_WAIT on the client side for 1 minute (even 4 on
certain legacy stacks), and once all source ports are blocked like
this, the client cannot establish new connections anymore.

However, this is a situation we *normally* deal with in haproxy:

  - in TCP, we're *supposed* to respect exactly this sequence, and
    do the same on the other side since it might be the only way to
    pass the protocol from end-to-end ; there's even an series of
    test for this one in the old test-fsm.cfg ;

  - in HTTP, we normally pass the request as-is, and prepare for
    closing after delivering the response (since some clients are
    just netcat scripts).

But it's well known that in HTTP, a FIN from a client after the request
and before the respones usually corresponds to a browser closing by the
user clicking "stop" or closing a tab. For this reason there's an
option "abortonclose" which is used to abort the request before passing
it to the other side, or while it's still waiting for a connection to
establish.

It turns out that this "abortonclose" option also works for TCP and
totally makes sense there for a number of protocols. Thus, one
possible explanation is that this option is present in the original
config (maybe even inherited from the defaults section), in which case
this is the desired behavior. It would also correspond to the CC log
output (client closed during connect).

But it's also possible that we broke something again. This half-closed
client situation was broken a few times in the past because it doesn't
get enough love. It essentially corresponds to a denial-of-service
attempt and rarely to a normal behavior, and is rarely tested from this
last perspective. In addition, the idea of leaving blocked source ports
behind doesn't sound appealing to anyone for a reg-test :-/

> In TCP mode, we need to propagate the close from one side to the
> other, as we are not aware of the protocol. Not sure if it is possible
> (or a good idea) to keep sending buffer contents to the backend server
> when the client is already gone.

It's expected to work and is indeed not a good idea at the same time,
because this forces haproxy to consume all of its source ports very
quickly and makes it trivial for a client to block all of its outgoing
communications by maintaining a load of only ~500 connections per second.
Once this is assumed however, it must be possible (barring any bug, again).

> "[no] option abortonclose" only affects HTTP, according to the docs.

I'm pretty sure it's not limited to HTTP because I've met PR_O_ABRT_CLOSE
or something like this quite a few times in the connection setup code.
However it's very possible that the doc isn't clear about this or only
focuses on HTTP since it's where this usually matters.

Hoping this helps,
Willy

Reply via email to