Hi Ilya, On Wed, Feb 05, 2014 at 05:01:03PM -0800, Ilya Grigorik wrote: > This is looking very promising! I created a simple page which loads a large > image (~1.5MB), then onload fires, and after about 5s of wait, another > image is fetched. All the assets are fetched over the same TCP connection.
Cool! > - Sample WPT run: > http://www.webpagetest.org/result/140206_R2_0eab5be9abebd600c17f199158782114/3/details/ > - tcpdump trace: > http://cloudshark.org/captures/5092d680b992?filter=tcp.stream%3D%3D4 Thanks for the links. > All requests begin with a 1440 byte records (configured > as: tune.ssl.maxrecord=1400), and then get bumped to 16KB - awesome. In my opinion you could even double this in order to fill 2 MSS at once, since each client will accept at least 2 MSS in slow start. It will also avoid some systems delaying ACK of the single segment. > A couple of questions: > > (a) It's not clear to me how the threshold upgrade is determined? What > triggers the record size bump internally? The forwarding mechanism does two things : - the read side counts the number of consecutive iterations that read() filled the whole receive buffer. After 3 consecutive times, it considers that it's a streaming transfer and sets the flag CF_STREAMER on the communication channel. - after 2 incomplete reads, the flag disappears. - the send side detects the number of times it can send the whole buffer at once. It sets CF_STREAMER_FAST if it can flush the whole buffer 3 times in a row. - after 2 incomplete writes, the flag disappears. I preferred to only rely on CF_STREAMER and ignore the _FAST variant because it would only favor high bandwidth clients (it's used to enable splice() in fact). But I thought that CF_STREAMER alone would do the right job. And your WPT test seems to confirm this, when we look at the bandwidth usage! > (b) If I understood your earlier comment correctly, HAProxy will > automatically begin each new request with small record size... when it > detects that it's a new request. Indeed. In HTTP mode, it processes transactions (request+response), not connections, and each new transaction starts in a fresh state where these flags are cleared. > This works great if we're talking to a > backend in "http" mode: we parse the HTTP/1.x protocol and detect when a > new request is being processed, etc. However, what if I'm using HAProxy to > terminate TLS (+alpn negotiate) and then route the data to a "tcp" mode > backend.. which is my spdy / http/2 server talking over a non-encrypted > channel. Ah good point. I *suspect* that in practice it will work because : - the last segment of the first transfer will almost always be incomplete (you don't always transfer exact multiples of the buffer size) ; - the first response for the next request will almost always be incomplete (headers and not all data) So if we're in this situation, this will be enough to reset the CF_STREAMER flag (2 consecutive incomplete reads). I think it would be worth testing it. A very simple way to test it in your environment would be to chain two instances, one in TCP mode deciphering, and one in HTTP mode. > In this instance this logic wouldn't work, since HAProxy doesn't > have any knowledge or understanding of spdy / http/2 streams -- we'd start > the entire connection with small records, but then eventually upgrade it to > 16KB and keep it there, correct? It's not kept, it really depends on the transfer sizes all along. It matches more or less what you explained at the beginning of this thread, but based on transfer sizes at the lower layers. > Any clever solutions for this? And on that note, are there future plans to > add "http/2" smarts to HAProxy, such that we can pick apart different > streams within a session, etc? Yes, I absolutely want to implement HTTP/2 but it will be time consuming and we won't have this for 1.5 at all. I also don't want to implement SPDY nor too early releases of 2.0, just because whatever we do will take a lot of time. Haproxy is a low level component, and each protocol adaptation is expensive to do. Not as much expensive as what people have to do with ASICs, but still harder than what some other products can do by using a small lib to perform the abstraction. One of the huge difficulties we'll face will be to manage multiple streams over one connection. I think it will change the current paradigm of how requests are instanciated (which already started). From the very first version, we instanciated one "session" upon accept(), and this session contains buffers on which analyzers are plugged. The HTTP parsers are such analyzers. All the states and counters are stored at the session level. In 1.5, we started to change a few things. A connection is instanciated upon accept, then the session allocated after the connection is initialized (eg: SSL handshake complete). But splitting the sessions between multiple requests will be quite complex. For example, I fear that we'll have to always copy data because we'll have multiple connections on one side and a single multiplexed one on the other side. You can take a look at doc/internal/entities.pdf if you're interested. Best regards, Willy

