Hi Ilya,

On Wed, Feb 05, 2014 at 05:01:03PM -0800, Ilya Grigorik wrote:
> This is looking very promising! I created a simple page which loads a large
> image (~1.5MB), then onload fires, and after about 5s of wait, another
> image is fetched. All the assets are fetched over the same TCP connection.

Cool!

> - Sample WPT run:
> http://www.webpagetest.org/result/140206_R2_0eab5be9abebd600c17f199158782114/3/details/
> - tcpdump trace:
> http://cloudshark.org/captures/5092d680b992?filter=tcp.stream%3D%3D4

Thanks for the links.

> All requests begin with a 1440 byte records (configured
> as: tune.ssl.maxrecord=1400), and then get bumped to 16KB - awesome.

In my opinion you could even double this in order to fill 2 MSS at once,
since each client will accept at least 2 MSS in slow start. It will also
avoid some systems delaying ACK of the single segment.

> A couple of questions:
> 
> (a) It's not clear to me how the threshold upgrade is determined? What
> triggers the record size bump internally?

The forwarding mechanism does two things :
  - the read side counts the number of consecutive iterations that
    read() filled the whole receive buffer. After 3 consecutive times,
    it considers that it's a streaming transfer and sets the flag
    CF_STREAMER on the communication channel.

  - after 2 incomplete reads, the flag disappears.

  - the send side detects the number of times it can send the whole
    buffer at once. It sets CF_STREAMER_FAST if it can flush the
    whole buffer 3 times in a row.

  - after 2 incomplete writes, the flag disappears.

I preferred to only rely on CF_STREAMER and ignore the _FAST variant
because it would only favor high bandwidth clients (it's used to
enable splice() in fact). But I thought that CF_STREAMER alone would
do the right job. And your WPT test seems to confirm this, when we
look at the bandwidth usage!

> (b) If I understood your earlier comment correctly, HAProxy will
> automatically begin each new request with small record size... when it
> detects that it's a new request.

Indeed. In HTTP mode, it processes transactions (request+response), not
connections, and each new transaction starts in a fresh state where these
flags are cleared.

> This works great if we're talking to a
> backend in "http" mode: we parse the HTTP/1.x protocol and detect when a
> new request is being processed, etc. However, what if I'm using HAProxy to
> terminate TLS (+alpn negotiate) and then route the data to a "tcp" mode
> backend.. which is my spdy / http/2 server talking over a non-encrypted
> channel.

Ah good point. I *suspect* that in practice it will work because :

  - the last segment of the first transfer will almost always be incomplete
    (you don't always transfer exact multiples of the buffer size) ;
  - the first response for the next request will almost always be incomplete
    (headers and not all data)

So if we're in this situation, this will be enough to reset the CF_STREAMER
flag (2 consecutive incomplete reads). I think it would be worth testing it.
A very simple way to test it in your environment would be to chain two
instances, one in TCP mode deciphering, and one in HTTP mode.

> In this instance this logic wouldn't work, since HAProxy doesn't
> have any knowledge or understanding of spdy / http/2 streams -- we'd start
> the entire connection with small records, but then eventually upgrade it to
> 16KB and keep it there, correct?

It's not kept, it really depends on the transfer sizes all along. It matches
more or less what you explained at the beginning of this thread, but based
on transfer sizes at the lower layers.

> Any clever solutions for this? And on that note, are there future plans to
> add "http/2" smarts to HAProxy, such that we can pick apart different
> streams within a session, etc?

Yes, I absolutely want to implement HTTP/2 but it will be time consuming and
we won't have this for 1.5 at all. I also don't want to implement SPDY nor
too early releases of 2.0, just because whatever we do will take a lot of
time. Haproxy is a low level component, and each protocol adaptation is
expensive to do. Not as much expensive as what people have to do with ASICs,
but still harder than what some other products can do by using a small lib
to perform the abstraction.

One of the huge difficulties we'll face will be to manage multiple streams
over one connection. I think it will change the current paradigm of how
requests are instanciated (which already started). From the very first
version, we instanciated one "session" upon accept(), and this session
contains buffers on which analyzers are plugged. The HTTP parsers are
such analyzers. All the states and counters are stored at the session
level. In 1.5, we started to change a few things. A connection is
instanciated upon accept, then the session allocated after the connection
is initialized (eg: SSL handshake complete). But splitting the sessions
between multiple requests will be quite complex. For example, I fear
that we'll have to always copy data because we'll have multiple
connections on one side and a single multiplexed one on the other side.
You can take a look at doc/internal/entities.pdf if you're interested.

Best regards,
Willy


Reply via email to