Thank you very much, Willy! Turning off abortonclose (it was enabled globally) for this particular session really helped :)
-- Best regards, Maksim вт, 9 февр. 2021 г. в 17:46, Willy Tarreau <w...@1wt.eu>: > Hi guys, > > > > I faced a problem dealing with l4 (tcp mode) haproxy-based proxy over > > > Graphite's component receiving metrics from clients and clients who are > > > connecting just to send one or two Graphite-metrics and disconnecting > right > > > after. > > > > > > It looks like this > > > 1. Client connects to haproxy (SYN/SYN-ACK/ACK) > > > 2. Client sends one line of metric > > > 3. Haproxy acknowledges receiving this line (ACK to client) > > > 4. Client disconnects (FIN, FIN-ACK) > > > 5. Haproxy writes 1/-1/0/0 CC-termination state to log without even > trying to connect to a backend and send client's data to it. > > > 6. Metric is lost :( > > > > > > If the client is slow enough between steps 1 and 2 or it sends a bunch > of metrics so haproxy has time to connect to a backend - everything works > like a charm. > > > > The issue though is the client disconnect. If we delay the client > > disconnect, it could work. Try playing with tc by delaying the > > incoming FIN packets for a few hundred milliseconds (make sure you > > only apply this to this particular traffic, for example this > > particular destination port). > > > > In fact it's not that black-or-white. A client disconnecting first > in TCP is *always* a protocol design issue, because it leaves the > source port in TIME_WAIT on the client side for 1 minute (even 4 on > certain legacy stacks), and once all source ports are blocked like > this, the client cannot establish new connections anymore. > > However, this is a situation we *normally* deal with in haproxy: > > - in TCP, we're *supposed* to respect exactly this sequence, and > do the same on the other side since it might be the only way to > pass the protocol from end-to-end ; there's even an series of > test for this one in the old test-fsm.cfg ; > > - in HTTP, we normally pass the request as-is, and prepare for > closing after delivering the response (since some clients are > just netcat scripts). > > But it's well known that in HTTP, a FIN from a client after the request > and before the respones usually corresponds to a browser closing by the > user clicking "stop" or closing a tab. For this reason there's an > option "abortonclose" which is used to abort the request before passing > it to the other side, or while it's still waiting for a connection to > establish. > > It turns out that this "abortonclose" option also works for TCP and > totally makes sense there for a number of protocols. Thus, one > possible explanation is that this option is present in the original > config (maybe even inherited from the defaults section), in which case > this is the desired behavior. It would also correspond to the CC log > output (client closed during connect). > > But it's also possible that we broke something again. This half-closed > client situation was broken a few times in the past because it doesn't > get enough love. It essentially corresponds to a denial-of-service > attempt and rarely to a normal behavior, and is rarely tested from this > last perspective. In addition, the idea of leaving blocked source ports > behind doesn't sound appealing to anyone for a reg-test :-/ > > > In TCP mode, we need to propagate the close from one side to the > > other, as we are not aware of the protocol. Not sure if it is possible > > (or a good idea) to keep sending buffer contents to the backend server > > when the client is already gone. > > It's expected to work and is indeed not a good idea at the same time, > because this forces haproxy to consume all of its source ports very > quickly and makes it trivial for a client to block all of its outgoing > communications by maintaining a load of only ~500 connections per second. > Once this is assumed however, it must be possible (barring any bug, again). > > > "[no] option abortonclose" only affects HTTP, according to the docs. > > I'm pretty sure it's not limited to HTTP because I've met PR_O_ABRT_CLOSE > or something like this quite a few times in the connection setup code. > However it's very possible that the doc isn't clear about this or only > focuses on HTTP since it's where this usually matters. > > Hoping this helps, > Willy >