Justin Erenkrantz wrote:

>On Fri, Sep 28, 2001 at 04:57:05PM -0700, Brian Pane wrote:
>
>>The notes following the profile offer my interpretation of
>>the numbers.
>>
>>                             % of total    cumulative %
>>rank function                 CPU time    of total CPU time
>>---- ------------------       ----------  -----------------
>> 1. munmap                     18.16
>> 2. _writev                    14.00
>> 3. mmap                       11.96
>> 4. _so_accept                  5.71
>> 5. bndm                        4.82
>> 6. _close                      4.76
>> 7. _so_shutdown                4.16
>> 8. __open                      3.18
>> 9. _write                      2.61
>>10. memset                      2.38
>>                                             71.74
>>
>
>Most of these look related to OS things that we can't control.
>

Yes.  Or, to put it another way, the implementation efficiency is
approaching its theoretical optimum for this architecture.  That's
not a bad thing. :-)

>As you said, bndm is ~4x better than its predecessor.  Unless
>we were to precompute mod_include files or rip out its parsing
>algorithm with something better (which I think is possible),
>I don't think there is much we can do here.  
>

Agreed, there probably isn't much more that can be done.  I think
2.0's SSI parser is very close to optimal now.

>I'm surprised _os_accept, _close, and _so_shutdown are up there
>in CPU usage.  I also know that gstein has gone record on his
>advogato diary that mmap may not be the performance win we
>think it may be.  It may be worth trying to compile without
>mmap and see how we perform.  Also, I'm guessing this is on
>Solaris 8?  Any chance we could get the sendfilev patch on 
>there?  (I know it is part of 7/01 and Sol9 as well.)
>

I'll see if we can get some more comparative numbers this week...

>Out of curiousity, how is the performance relative to 1.3
>for similar loads or URLS?  -- justin
>

I just ran a quick comparison, using ab with concurrency==1
over the loopback on uniprocessor Linux (so these results show
speed, but not scalability).

  50KB non-SSI HTML file:     1141 requests/s with 2.0.26-dev
                               948 requests/s with 1.3.20

  SHTML file with 3 includes:  411 requests/s with 2.0.26-dev
   (total size 50KB)           200 requests/s with 1.3.20

  0KB non-SSI HTML file:      2764 requests/s with 2.0.26-dev
                              2937 requests/s with 1.3.20

My interpretation of these results is that 2.0's basic request
handling is a bit less efficient than 1.3, as shown on the
0KB test, but its use of sendfile for non-SSI requests and
much-improved mod_include code for SSI requests gives it an
advantage in the general case.

--Brian


Reply via email to