On Tue, 2002-07-02 at 21:09, Bill Stoddard wrote:
> .ap_rgetline_core protocol.c 3.32 6 1.4 3428
I'll try the persistent-temp-brigade approach for ap_rgetline_core()
to see how much it helps.
> .get_filter_handle util_filter.c 4.62 2 1.1 2730
Is this being called from a custom module?
> .apr_brigade_puts glink.s 6.88 4 0.1 165
Does this test include your hack to bypass the normal writing of the
response header? (Or is apr_brigade_puts() just in much better shape
now?)
> .apr_table_get apr_tables.c 3.26 17 1.8 4255
Here's some more information on the performance of apr_table_get():
For typical requests, we do an average of 0.25 strcasecmp() ops
per apr_table_get(). The good news is that the checksum code is
doing its job by eliminating the need for most string comparisons.
The bad news is that we're typically doing N integer checksum
comparisons for an N-element table (i.e., at least 3/4 of all
apr_table_get() calls don't find a match, which means that we
have to scan through the entire table).
I can think of three ways to potentially speed up apr_table_get()
1. Keep the basic design the same, but optimize the checksum
computation. We might be able to replace the 4 character
reads from memory with a single integer read from memory
in the common case where the start of the string is
word-aligned.
2. Add some form of optional indexing to the tables.
3. Add an "is_sorted" flag to apr_table_t, and do a binary
search in apr_table_get() if the table is sorted. (I
think we can get the request headers sorted for free as
a side effect of the apr_table_overlap() implementation.)
> .apr_palloc apr_pools.c 2.13 95 1.5 3641
In my last round of profiling, 30+% of the apr_palloc() cycles were
related to brigade creation. Thus the persistent-brigade hack for
ap_rgetline_core() may provide a measurable improvement here.
--Brian