Re: [RFC] haterm: reduce response-path overhead for large responses

Willy Tarreau Mon, 16 Mar 2026 03:34:17 -0700

On Mon, Mar 16, 2026 at 11:02:54AM +0100, Aleksandar Lazic wrote:
> >    - the time measurement is not correct actually, it reports the requested
> >      time while the purpose was to indicate the generation time. It's useful
> >      when you don't know if you're measuring haterm's internal latency or
> >      network latency. I've uesd this a lot with httpterm in the past, where
> >      latencies of serveral milliseconds could happen on a saturated machine,
> >      and seeing the server denounce itself as the culprit was definitely
> >      helpful!
> 
> Thanks for explanation it was not clear to me which "time" should be here.
> So the "time" here should be "/?t=<time>" or something else?


It's the *measured* response time. I mean, the client sends /?t=10 to ask
haterm to wait for 10ms before responding, and haterm tries to wait 10ms,
but due to contention with other things to do etc, it might finally be 11
or 12. What this header is supposed to do is to indicate how long it really
waited since the request was received. Most of the time it's the same value
as requested, but when it's significantly higher, you know the server is
undersized and is degrading measurements.

> >    - the last patch creating the loop to try to better fill the target
> >      buffer should theoretically not change anything, yet it does. On
> >      the AMD it degrades the performance by an extra 2-3%, while on the
> >      ARM it brings roughly 3%.
> 
> That's strange, I also not expected that.

Christopher just explained to me what this does. By default htx_add_data()
will not wrap data so it will perform a single copy. This explains why
we're observing a difference between with and without. But this also means
that we have fragmented data in the buffer, and being forced to work with
such fragmented data is inefficient and can explain the variation we're
observing depending on the machines etc. I'll rather look if I can prevent
from sending again until the htx buffer is completely empty, so as to only
work with full buffers. It's more efficient and guarantees that data remain
aligned. Also I'd like that we implement splicing like we had in httpterm.
It will require some low-level changes in haproxy because for now the
splicing is only part of the forwarding phase, but we have ideas on how to
do that, and it would bring us kTLS and hw acceleration for free ;-)

> With this lessons learned I will still use HATerm for my server benchmarks
> just because it offers h1+h2+h3 as target for the reverse proxies :-)

Yes that's the main point, it does help a lot!

> > If you want I can already merge your first patch (snprinf) as it's
> > definitely useful.
> 
> Yes, thanks.

OK will do, thanks!
Willy

Re: [RFC] haterm: reduce response-path overhead for large responses

Reply via email to