Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Brian Pane 19 Jul 2001 19:15:35 -0000

William A. Rowe, Jr. wrote:

From: "Justin Erenkrantz" <[EMAIL PROTECTED]>
Sent: Thursday, July 19, 2001 1:06 PM
I wouldn't recommend using the threaded code at all because we are still doing a per-process allocation mutex which causes threaded to become useless. When that is changed (i.e. we enable SMS), I think that threaded MPM will deserve to be beat up and tested. -- justin
Tag and roll today, and enable SMS.  This is now a bottleneck, and no doubt
SMS will _significantly_ help us out with the threading/locking performance
issues.

It's worth noting that, for non-server-parsed content, apr_palloc
(in the original, non-SMS implementation) doesn't actually have to
acquire a lock very often.  From gprof,

0.00 0.00 994/1588875 apr_palloc [31] 0.00 0.00 87710/1588875 apr_file_read [5] 0.00 0.00 500048/1588875 apr_pool_destroy <cycle 5> [145] 0.00 0.00 500049/1588875 free_blocks [91] 0.00 0.00 500074/1588875 apr_pool_sub_make [143] [87] 0.0 0.00 0.00 1588875 apr_lock_acquire [87]

The numbers mean that, out of 1,588,875 calls to apr_lock_aquire,
994 of them were from apr_palloc.

For a test using server-parsed requests, the pattern is very different: 0.00 0.00 87710/14587902 apr_file_read [9] 0.00 0.00 3000048/14587902 apr_pool_destroy <cycle 5> [22] 0.00 0.00 3000074/14587902 apr_pool_sub_make [31] 0.00 0.00 4000049/14587902 free_blocks [28] 0.00 0.00 4500021/14587902 apr_palloc [27] [13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]

Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free
source of additional blocks for an SMS will help a lot.

Some thoughts based on the numbers:

 * For anybody working on tuning the SMS implementation, I highly
   recommend incorporating mod_include into your test cases.

 * Creating and destroying pools is the major bottleneck for
   non-server-parsed requests.  In order to achieve big speedups
   in the httpd, the SMS implementation needs to make sub-pool
   creation and destruction faster than the original pool design.

 * In the non-server-parsed case, apr_palloc is one of the most
   time-consuming functions in the httpd.  Keep in mind that it
   almost never (in this test case) has to acquire a lock and
   call new_block; instead, it's usually taking the fast path
   through the code that requires just a few arithmetic and pointer
   operations.  While it's probably possible to tune the code
   a bit, it's arguably close to optimal already.  What this
   means to me is that the real optimization opportunity for
   non-server-parsed content is not to make apr_palloc faster,
   but rather to stop calling apr_palloc so much.

--Brian

Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Reply via email to