Hi guys, > > I faced a problem dealing with l4 (tcp mode) haproxy-based proxy over > > Graphite's component receiving metrics from clients and clients who are > > connecting just to send one or two Graphite-metrics and disconnecting right > > after. > > > > It looks like this > > 1. Client connects to haproxy (SYN/SYN-ACK/ACK) > > 2. Client sends one line of metric > > 3. Haproxy acknowledges receiving this line (ACK to client) > > 4. Client disconnects (FIN, FIN-ACK) > > 5. Haproxy writes 1/-1/0/0 CC-termination state to log without even trying > > to connect to a backend and send client's data to it. > > 6. Metric is lost :( > > > > If the client is slow enough between steps 1 and 2 or it sends a bunch of > > metrics so haproxy has time to connect to a backend - everything works like > > a charm. > > The issue though is the client disconnect. If we delay the client > disconnect, it could work. Try playing with tc by delaying the > incoming FIN packets for a few hundred milliseconds (make sure you > only apply this to this particular traffic, for example this > particular destination port). >
In fact it's not that black-or-white. A client disconnecting first in TCP is *always* a protocol design issue, because it leaves the source port in TIME_WAIT on the client side for 1 minute (even 4 on certain legacy stacks), and once all source ports are blocked like this, the client cannot establish new connections anymore. However, this is a situation we *normally* deal with in haproxy: - in TCP, we're *supposed* to respect exactly this sequence, and do the same on the other side since it might be the only way to pass the protocol from end-to-end ; there's even an series of test for this one in the old test-fsm.cfg ; - in HTTP, we normally pass the request as-is, and prepare for closing after delivering the response (since some clients are just netcat scripts). But it's well known that in HTTP, a FIN from a client after the request and before the respones usually corresponds to a browser closing by the user clicking "stop" or closing a tab. For this reason there's an option "abortonclose" which is used to abort the request before passing it to the other side, or while it's still waiting for a connection to establish. It turns out that this "abortonclose" option also works for TCP and totally makes sense there for a number of protocols. Thus, one possible explanation is that this option is present in the original config (maybe even inherited from the defaults section), in which case this is the desired behavior. It would also correspond to the CC log output (client closed during connect). But it's also possible that we broke something again. This half-closed client situation was broken a few times in the past because it doesn't get enough love. It essentially corresponds to a denial-of-service attempt and rarely to a normal behavior, and is rarely tested from this last perspective. In addition, the idea of leaving blocked source ports behind doesn't sound appealing to anyone for a reg-test :-/ > In TCP mode, we need to propagate the close from one side to the > other, as we are not aware of the protocol. Not sure if it is possible > (or a good idea) to keep sending buffer contents to the backend server > when the client is already gone. It's expected to work and is indeed not a good idea at the same time, because this forces haproxy to consume all of its source ports very quickly and makes it trivial for a client to block all of its outgoing communications by maintaining a load of only ~500 connections per second. Once this is assumed however, it must be possible (barring any bug, again). > "[no] option abortonclose" only affects HTTP, according to the docs. I'm pretty sure it's not limited to HTTP because I've met PR_O_ABRT_CLOSE or something like this quite a few times in the connection setup code. However it's very possible that the doc isn't clear about this or only focuses on HTTP since it's where this usually matters. Hoping this helps, Willy