William A. Rowe, Jr. wrote:
From: "Justin Erenkrantz" <[EMAIL PROTECTED]> Sent: Thursday, July 19, 2001 1:06 PM
I wouldn't recommend using the threaded code at all because we are still
doing a per-process allocation mutex which causes threaded to become
useless. When that is changed (i.e. we enable SMS), I think that threaded MPM will deserve to be beat up and tested. -- justin
Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt SMS will _significantly_ help us out with the threading/locking performance issues.
It's worth noting that, for non-server-parsed content, apr_palloc (in the original, non-SMS implementation) doesn't actually have to acquire a lock very often. From gprof,
0.00 0.00 994/1588875 apr_palloc [31]
0.00 0.00 87710/1588875 apr_file_read [5]
0.00 0.00 500048/1588875 apr_pool_destroy <cycle 5> [145]
0.00 0.00 500049/1588875 free_blocks [91]
0.00 0.00 500074/1588875 apr_pool_sub_make [143]
[87] 0.0 0.00 0.00 1588875 apr_lock_acquire [87]
The numbers mean that, out of 1,588,875 calls to apr_lock_aquire, 994 of them were from apr_palloc.
For a test using server-parsed requests, the pattern is very different:
0.00 0.00 87710/14587902 apr_file_read [9]
0.00 0.00 3000048/14587902 apr_pool_destroy <cycle 5> [22]
0.00 0.00 3000074/14587902 apr_pool_sub_make [31]
0.00 0.00 4000049/14587902 free_blocks [28]
0.00 0.00 4500021/14587902 apr_palloc [27]
[13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]
Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free source of additional blocks for an SMS will help a lot.
Some thoughts based on the numbers:
* For anybody working on tuning the SMS implementation, I highly recommend incorporating mod_include into your test cases.
* Creating and destroying pools is the major bottleneck for non-server-parsed requests. In order to achieve big speedups in the httpd, the SMS implementation needs to make sub-pool creation and destruction faster than the original pool design.
* In the non-server-parsed case, apr_palloc is one of the most time-consuming functions in the httpd. Keep in mind that it almost never (in this test case) has to acquire a lock and call new_block; instead, it's usually taking the fast path through the code that requires just a few arithmetic and pointer operations. While it's probably possible to tune the code a bit, it's arguably close to optimal already. What this means to me is that the real optimization opportunity for non-server-parsed content is not to make apr_palloc faster, but rather to stop calling apr_palloc so much.
--Brian