William A. Rowe, Jr. wrote:

From: "Justin Erenkrantz" <[EMAIL PROTECTED]>
Sent: Thursday, July 19, 2001 1:06 PM


I wouldn't recommend using the threaded code at all because we are still
doing a per-process allocation mutex which causes threaded to become
useless. When that is changed (i.e. we enable SMS), I think that threaded MPM will deserve to be beat up and tested. -- justin



Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt SMS will _significantly_ help us out with the threading/locking performance issues.

It's worth noting that, for non-server-parsed content, apr_palloc
(in the original, non-SMS implementation) doesn't actually have to
acquire a lock very often.  From gprof,

0.00 0.00 994/1588875 apr_palloc [31]
0.00 0.00 87710/1588875 apr_file_read [5]
0.00 0.00 500048/1588875 apr_pool_destroy <cycle 5> [145]
0.00 0.00 500049/1588875 free_blocks [91]
0.00 0.00 500074/1588875 apr_pool_sub_make [143]
[87] 0.0 0.00 0.00 1588875 apr_lock_acquire [87]


The numbers mean that, out of 1,588,875 calls to apr_lock_aquire,
994 of them were from apr_palloc.

For a test using server-parsed requests, the pattern is very different:
0.00 0.00 87710/14587902 apr_file_read [9]
0.00 0.00 3000048/14587902 apr_pool_destroy <cycle 5> [22]
0.00 0.00 3000074/14587902 apr_pool_sub_make [31]
0.00 0.00 4000049/14587902 free_blocks [28]
0.00 0.00 4500021/14587902 apr_palloc [27]
[13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]


Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free
source of additional blocks for an SMS will help a lot.

Some thoughts based on the numbers:

 * For anybody working on tuning the SMS implementation, I highly
   recommend incorporating mod_include into your test cases.

 * Creating and destroying pools is the major bottleneck for
   non-server-parsed requests.  In order to achieve big speedups
   in the httpd, the SMS implementation needs to make sub-pool
   creation and destruction faster than the original pool design.

 * In the non-server-parsed case, apr_palloc is one of the most
   time-consuming functions in the httpd.  Keep in mind that it
   almost never (in this test case) has to acquire a lock and
   call new_block; instead, it's usually taking the fast path
   through the code that requires just a few arithmetic and pointer
   operations.  While it's probably possible to tune the code
   a bit, it's arguably close to optimal already.  What this
   means to me is that the real optimization opportunity for
   non-server-parsed content is not to make apr_palloc faster,
   but rather to stop calling apr_palloc so much.

--Brian





Reply via email to