Re: Apache 2 cycle vs instruction profiling (2.0.40-dev)

Brian Pane Tue, 02 Jul 2002 21:53:15 -0700

On Tue, 2002-07-02 at 21:09, Bill Stoddard wrote:

> .ap_rgetline_core        protocol.c                    3.32 6      1.4 3428


I'll try the persistent-temp-brigade approach for ap_rgetline_core()
to see how much it helps.

> .get_filter_handle       util_filter.c                 4.62 2      1.1 2730

Is this being called from a custom module?

> .apr_brigade_puts        glink.s                       6.88 4      0.1 165

Does this test include your hack to bypass the normal writing of the
response header?  (Or is apr_brigade_puts() just in much better shape
now?)

> .apr_table_get           apr_tables.c                  3.26 17     1.8 4255

Here's some more information on the performance of apr_table_get():
For typical requests, we do an average of 0.25 strcasecmp() ops
per apr_table_get().  The good news is that the checksum code is
doing its job by eliminating the need for most string comparisons.
The bad news is that we're typically doing N integer checksum
comparisons for an N-element table (i.e., at least 3/4 of all
apr_table_get() calls don't find a match, which means that we
have to scan through the entire table).

I can think of three ways to potentially speed up apr_table_get()

   1. Keep the basic design the same, but optimize the checksum
      computation.  We might be able to replace the 4 character
      reads from memory with a single integer read from memory
      in the common case where the start of the string is
      word-aligned.

   2. Add some form of optional indexing to the tables.

   3. Add an "is_sorted" flag to apr_table_t, and do a binary
      search in apr_table_get() if the table is sorted.  (I
      think we can get the request headers sorted for free as
      a side effect of the apr_table_overlap() implementation.)

> .apr_palloc              apr_pools.c                   2.13 95     1.5 3641

In my last round of profiling, 30+% of the apr_palloc() cycles were
related to brigade creation.  Thus the persistent-brigade hack for
ap_rgetline_core() may provide a measurable improvement here.

--Brian

Re: Apache 2 cycle vs instruction profiling (2.0.40-dev)

Reply via email to