Roy T. Fielding wrote:
A better optimization might be to reduce the number of calls to brigade_puts. That's how much of 1.3 was improved.
I only know of three ways to reduce the number of apr_brigade_puts() calls in 2.0:
* Send fewer fields in the HTTP response header.
* Or do more buffering prior to calling apr_brigade_puts(). (This is what 2.0 used to do, and it was even slower, because it added yet another layer of memory copying before the socket write.)
* Or produce a separate bucket for each field in the response header, and rely on writev to patch them together. (This won't work in 2.0; if the number of tiny buckets grows too large, core_output_filter() will try to consolidate them into a single bucket, with the associated memcpy cost.)
Were you thinking of a different approach from these?
--Brian